/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_sl.py:549: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 1133 PASS: my_features_df and aa_df successfully combined nrows: 1133 ncols: 274 count of NULL values before imputation or_mychisq 339 log10_or_mychisq 339 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML Total no. of features for aaindex: 123 No. of numerical features: 169 No. of categorical features: 7 PASS: x_features has no target variable No. of columns for x_features: 176 ------------------------------------------------------------- Successfully split data according to scaling law: 1/np.sqrt(x_ncols) Train data size: (515, 176) Test data size: 0.07537783614444091 (42, 176) y_train numbers: Counter({0: 261, 1: 254}) y_train ratio: 1.0275590551181102 y_test_numbers: Counter({0: 21, 1: 21}) y_test ratio: 1.0 ------------------------------------------------------------- Simple Random OverSampling Counter({0: 261, 1: 261}) (522, 176) Simple Random UnderSampling Counter({0: 254, 1: 254}) (508, 176) Simple Combined Over and UnderSampling Counter({0: 261, 1: 261}) (522, 176) SMOTE_NC OverSampling Counter({0: 261, 1: 261}) (522, 176) ##################################################################### Running ML analysis: scaling law split Gene name: rpoB Drug name: rifampicin Output directory: /home/tanu/git/Data/rifampicin/output/ml/tts_sl/ Sanity checks: ML source data size: (557, 176) Total input features: (515, 176) Target feature numbers: Counter({0: 261, 1: 254}) Target features ratio: 1.0275590551181102 ##################################################################### ================================================================ Strucutral features (n): 37 These are: Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist'] FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss'] Other struc columns: ['rsa', 'kd_values', 'rd_values'] ================================================================ AAindex features (n): 123 ================================================================ Evolutionary features (n): 3 These are: ['consurf_score', 'snap2_score', 'provean_score'] ================================================================ Genomic features (n): 6 These are: ['maf', 'logorI'] ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'] ================================================================ Categorical features (n): 7 These are: ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'] ================================================================ Pass: No. of features match ##################################################################### Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03457308 0.05314136 0.03903151 0.0377934 0.03705144 0.03332138 0.03554845 0.03472877 0.0331893 0.03619957] mean value: 0.037457823753356934 key: score_time value: [0.01273632 0.01220393 0.01418066 0.01225758 0.01433253 0.01218915 0.0123353 0.01226211 0.01226807 0.01440454] mean value: 0.012917017936706543 key: test_mcc value: [0.76888889 0.61538462 0.84866842 0.84866842 0.77151675 0.88289781 0.76733527 0.88289781 0.80990051 0.69568237] mean value: 0.7891840882445919 key: train_mcc value: [0.87494868 0.86615908 0.86178968 0.86190423 0.85751876 0.84920893 0.86645175 0.86645175 0.85783034 0.87499419] mean value: 0.8637257413526429 key: test_accuracy value: [0.88461538 0.80769231 0.92307692 0.92307692 0.88461538 0.94117647 0.88235294 0.94117647 0.90196078 0.84313725] mean value: 0.8932880844645551 key: train_accuracy value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.93736501 0.93304536 0.93088553 0.93088553 0.9287257 0.92456897 0.93318966 0.93318966 0.92887931 0.9375 ] mean value: 0.9318234713636703 key: test_fscore value: [0.88 0.80769231 0.92 0.92 0.88888889 0.93877551 0.88461538 0.93877551 0.90566038 0.85185185] mean value: 0.8936259830815086 key: train_fscore value: [0.93736501 0.93246187 0.930131 0.93043478 0.92810458 0.92407809 0.93275488 0.93275488 0.92841649 0.93681917] mean value: 0.931332075708447 key: test_precision value: [0.88 0.80769231 0.95833333 0.95833333 0.85714286 0.95833333 0.85185185 0.95833333 0.85714286 0.79310345] mean value: 0.888026665543907 key: train_precision value: [0.92735043 0.92640693 0.92608696 0.92241379 0.92207792 0.91810345 0.92672414 0.92672414 0.92241379 0.93478261] mean value: 0.9253084151397495 key: test_recall value: [0.88 0.80769231 0.88461538 0.88461538 0.92307692 0.92 0.92 0.92 0.96 0.92 ] mean value: 0.902 key: train_recall value: [0.94759825 0.93859649 0.93421053 0.93859649 0.93421053 0.930131 0.93886463 0.93886463 0.93449782 0.93886463] mean value: 0.9374434995786409 key: test_roc_auc value: [0.88444444 0.80769231 0.92307692 0.92307692 0.88461538 0.94076923 0.88307692 0.94076923 0.90307692 0.84461538] mean value: 0.8935213675213675 key: train_roc_auc value: [0.93747434 0.93312803 0.93093505 0.93100037 0.92880739 0.92463997 0.9332621 0.9332621 0.92895104 0.93751742] mean value: 0.9318977817951397 key: test_jcc value: [0.78571429 0.67741935 0.85185185 0.85185185 0.8 0.88461538 0.79310345 0.88461538 0.82758621 0.74193548] mean value: 0.809869325253085 key: train_jcc value: [0.88211382 0.87346939 0.86938776 0.8699187 0.86585366 0.85887097 0.87398374 0.87398374 0.86639676 0.88114754] mean value: 0.8715126071252873 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.84475875 0.97533607 0.87528491 1.0226748 0.90170574 0.9565835 0.97258997 0.88586736 1.06623507 0.90682554] mean value: 0.9407861709594727 key: score_time value: [0.01480126 0.01481223 0.01491356 0.01508093 0.02434945 0.01536441 0.01489663 0.01477122 0.01504946 0.0123744 ] mean value: 0.01564135551452637 key: test_mcc value: [0.80829038 0.65433031 0.84866842 0.88527041 0.73568294 0.88289781 0.80461538 0.92153846 0.8459178 0.65224812] mean value: 0.803946003970833 key: train_mcc value: [0.90510935 0.91374613 0.89664633 0.90072034 0.90072034 0.90085939 0.90549103 0.89669076 0.8968689 0.83622884] mean value: 0.8953081428164044 key: test_accuracy value: [0.90384615 0.82692308 0.92307692 0.94230769 0.86538462 0.94117647 0.90196078 0.96078431 0.92156863 0.82352941] mean value: 0.9010558069381599 key: train_accuracy value: [0.9524838 0.95680346 0.94816415 0.95032397 0.95032397 0.95043103 0.95258621 0.94827586 0.94827586 0.91810345] mean value: 0.9475771765844939 key: test_fscore value: [0.90196078 0.82352941 0.92 0.94117647 0.87272727 0.93877551 0.90196078 0.96 0.92307692 0.83018868] mean value: 0.9013395836233953 key: train_fscore value: [0.95238095 0.95652174 0.94805195 0.94989107 0.94989107 0.94989107 0.95258621 0.94805195 0.94827586 0.9173913 ] mean value: 0.9472933163543006 key: test_precision value: [0.88461538 0.84 0.95833333 0.96 0.82758621 0.95833333 0.88461538 0.96 0.88888889 0.78571429] mean value: 0.8948086817397162 key: train_precision value: [0.94420601 0.94827586 0.93589744 0.94372294 0.94372294 0.94782609 0.94042553 0.93991416 0.93617021 0.91341991] mean value: 0.9393581102143395 key: test_recall value: [0.92 0.80769231 0.88461538 0.92307692 0.92307692 0.92 0.92 0.96 0.96 0.88 ] mean value: 0.9098461538461539 key: train_recall value: [0.96069869 0.96491228 0.96052632 0.95614035 0.95614035 0.95196507 0.9650655 0.95633188 0.96069869 0.92139738] mean value: 0.9553876503485789 key: test_roc_auc value: [0.90444444 0.82692308 0.92307692 0.94230769 0.86538462 0.94076923 0.90230769 0.96076923 0.92230769 0.82461538] mean value: 0.9012905982905982 key: train_roc_auc value: [0.95257157 0.95692423 0.94834826 0.9504106 0.9504106 0.95045062 0.95274552 0.9483787 0.94843445 0.9181455 ] mean value: 0.9476820048433202 key: test_jcc value: [0.82142857 0.7 0.85185185 0.88888889 0.77419355 0.88461538 0.82142857 0.92307692 0.85714286 0.70967742] mean value: 0.8232304016174984 key: train_jcc value: [0.90909091 0.91666667 0.90123457 0.90456432 0.90456432 0.90456432 0.90946502 0.90123457 0.90163934 0.84738956] mean value: 0.9000413580689495 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01280427 0.01010084 0.00992489 0.00991201 0.01013994 0.01014113 0.01022482 0.01023531 0.01004434 0.01003003] mean value: 0.010355758666992187 key: score_time value: [0.00947094 0.00892138 0.00892901 0.0089221 0.00908446 0.00910974 0.00904751 0.00909019 0.00910616 0.00897598] mean value: 0.009065747261047363 key: test_mcc value: [0.54156684 0.57735027 0.74466871 0.70064905 0.66628253 0.65064936 0.68779719 0.57342193 0.72615385 0.72984534] mean value: 0.6598385073077018 key: train_mcc value: [0.69176702 0.6927847 0.69160663 0.66143964 0.70344863 0.70415149 0.69511551 0.67751955 0.67041841 0.69062182] mean value: 0.6878873412216182 key: test_accuracy value: [0.76923077 0.78846154 0.86538462 0.84615385 0.82692308 0.82352941 0.84313725 0.78431373 0.8627451 0.8627451 ] mean value: 0.827262443438914 key: train_accuracy value: [0.84449244 0.84449244 0.84449244 0.82937365 0.85097192 0.8512931 0.84698276 0.8362069 0.83405172 0.84482759] mean value: 0.8427184963133983 key: test_fscore value: [0.73913043 0.78431373 0.85106383 0.83333333 0.80851064 0.80851064 0.83333333 0.79245283 0.8627451 0.85106383] mean value: 0.8164457691337579 key: train_fscore value: [0.83486239 0.83255814 0.83410138 0.81755196 0.8428246 0.84353741 0.83972912 0.82242991 0.82379863 0.83783784] mean value: 0.83292313777467 key: test_precision value: [0.80952381 0.8 0.95238095 0.90909091 0.9047619 0.86363636 0.86956522 0.75 0.84615385 0.90909091] mean value: 0.8614203912029998 key: train_precision value: [0.87922705 0.88613861 0.87864078 0.86341463 0.87677725 0.87735849 0.86915888 0.88442211 0.86538462 0.86511628] mean value: 0.8745638703109545 key: test_recall value: [0.68 0.76923077 0.76923077 0.76923077 0.73076923 0.76 0.8 0.84 0.88 0.8 ] mean value: 0.7798461538461539 key: train_recall value: [0.79475983 0.78508772 0.79385965 0.77631579 0.81140351 0.81222707 0.81222707 0.76855895 0.7860262 0.81222707] mean value: 0.7952692867540029 key: test_roc_auc value: [0.76592593 0.78846154 0.86538462 0.84615385 0.82692308 0.82230769 0.84230769 0.78538462 0.86307692 0.86153846] mean value: 0.8267464387464387 key: train_roc_auc value: [0.84396111 0.84360769 0.84373834 0.82858343 0.85038261 0.85079439 0.84653907 0.83534331 0.83343863 0.84441141] mean value: 0.8420799970776743 key: test_jcc value: [0.5862069 0.64516129 0.74074074 0.71428571 0.67857143 0.67857143 0.71428571 0.65625 0.75862069 0.74074074] mean value: 0.6913434643725245 key: train_jcc value: [0.71653543 0.71314741 0.71541502 0.69140625 0.72834646 0.72941176 0.72373541 0.6984127 0.70038911 0.72093023] mean value: 0.7137729779180588 MCC on Blind test: 0.63 Accuracy on Blind test: 0.81 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01047707 0.01029992 0.01025939 0.01041651 0.01023507 0.0102303 0.0112865 0.01102686 0.01036859 0.01120663] mean value: 0.010580682754516601 key: score_time value: [0.00895643 0.00900006 0.00895834 0.00900507 0.00899935 0.00976944 0.00984073 0.00931478 0.00945687 0.00984907] mean value: 0.009315013885498047 key: test_mcc value: [0.57831366 0.4233902 0.69230769 0.71151247 0.73568294 0.72573276 0.80461538 0.88307692 0.60769231 0.64715023] mean value: 0.6809474562124411 key: train_mcc value: [0.69835966 0.77538376 0.72791401 0.72780737 0.77538491 0.75470857 0.75496039 0.72841838 0.75426257 0.74133606] mean value: 0.7438535657906272 key: test_accuracy value: [0.78846154 0.71153846 0.84615385 0.84615385 0.86538462 0.8627451 0.90196078 0.94117647 0.80392157 0.82352941] mean value: 0.8391025641025641 key: train_accuracy value: [0.8488121 0.88768898 0.86393089 0.86393089 0.88768898 0.87715517 0.87715517 0.86422414 0.87715517 0.87068966] mean value: 0.8718431146197959 key: test_fscore value: [0.76595745 0.70588235 0.84615385 0.82608696 0.87272727 0.85714286 0.90196078 0.94117647 0.8 0.81632653] mean value: 0.8333414517809609 key: train_fscore value: [0.84304933 0.88495575 0.8627451 0.86092715 0.88646288 0.87741935 0.87794433 0.86153846 0.87527352 0.86899563] mean value: 0.8699311510042489 key: test_precision value: [0.81818182 0.72 0.84615385 0.95 0.82758621 0.875 0.88461538 0.92307692 0.8 0.83333333] mean value: 0.8477947512257857 key: train_precision value: [0.86635945 0.89285714 0.85714286 0.86666667 0.8826087 0.86440678 0.86134454 0.86725664 0.87719298 0.86899563] mean value: 0.8704831379611647 key: test_recall value: [0.72 0.69230769 0.84615385 0.73076923 0.92307692 0.84 0.92 0.96 0.8 0.8 ] mean value: 0.8232307692307692 key: train_recall value: [0.8209607 0.87719298 0.86842105 0.85526316 0.89035088 0.89082969 0.89519651 0.8558952 0.87336245 0.86899563] mean value: 0.8696468244847928 key: test_roc_auc value: [0.78592593 0.71153846 0.84615385 0.84615385 0.86538462 0.86230769 0.90230769 0.94153846 0.80384615 0.82307692] mean value: 0.8388233618233618 key: train_roc_auc value: [0.84851454 0.88753266 0.86399776 0.86380179 0.88772863 0.87732974 0.87738549 0.86411781 0.87710675 0.87066803] mean value: 0.8718183204075174 key: test_jcc value: [0.62068966 0.54545455 0.73333333 0.7037037 0.77419355 0.75 0.82142857 0.88888889 0.66666667 0.68965517] mean value: 0.7194014085449013 key: train_jcc value: [0.72868217 0.79365079 0.75862069 0.75581395 0.79607843 0.7816092 0.78244275 0.75675676 0.77821012 0.76833977] mean value: 0.7700204624031467 MCC on Blind test: 0.76 Accuracy on Blind test: 0.88 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01085258 0.01142216 0.01102972 0.010952 0.0106256 0.01055479 0.01059937 0.0103457 0.01040864 0.01033592] mean value: 0.010712647438049316 key: score_time value: [0.07376838 0.01394439 0.01408267 0.01353168 0.01317382 0.01507711 0.01556063 0.01267171 0.01504016 0.01285124] mean value: 0.019970178604125977 key: test_mcc value: [0.61551019 0.38575837 0.38575837 0.5 0.53846154 0.5372904 0.65064936 0.72573276 0.60769231 0.45474301] mean value: 0.5401596315962462 key: train_mcc value: [0.70231538 0.72376727 0.69801004 0.71054252 0.71511629 0.68110244 0.67701807 0.68110244 0.71576891 0.69138045] mean value: 0.6996123825721399 key: test_accuracy value: [0.80769231 0.69230769 0.69230769 0.73076923 0.76923077 0.76470588 0.82352941 0.8627451 0.80392157 0.7254902 ] mean value: 0.7672699849170437 key: train_accuracy value: [0.85097192 0.86177106 0.8488121 0.85529158 0.8574514 0.84051724 0.83836207 0.84051724 0.85775862 0.84482759] mean value: 0.8496280814776197 key: test_fscore value: [0.79166667 0.68 0.7037037 0.66666667 0.76923077 0.77777778 0.80851064 0.85714286 0.8 0.69565217] mean value: 0.7550351253399358 key: train_fscore value: [0.84632517 0.85714286 0.84304933 0.85339168 0.85267857 0.83628319 0.83296214 0.83628319 0.85333333 0.83636364] mean value: 0.8447813087328101 key: test_precision value: [0.82608696 0.70833333 0.67857143 0.875 0.76923077 0.72413793 0.86363636 0.875 0.8 0.76190476] mean value: 0.7881901544232879 key: train_precision value: [0.86363636 0.87272727 0.86238532 0.85152838 0.86818182 0.84753363 0.85 0.84753363 0.86877828 0.87203791] mean value: 0.8604342619734768 key: test_recall value: [0.76 0.65384615 0.73076923 0.53846154 0.76923077 0.84 0.76 0.84 0.8 0.64 ] mean value: 0.7332307692307692 key: train_recall value: [0.82969432 0.84210526 0.8245614 0.85526316 0.8377193 0.82532751 0.81659389 0.82532751 0.83842795 0.80349345] mean value: 0.8298513751627978 key: test_roc_auc value: [0.80592593 0.69230769 0.69230769 0.73076923 0.76923077 0.76615385 0.82230769 0.86230769 0.80384615 0.72384615] mean value: 0.7669002849002848 key: train_roc_auc value: [0.8507446 0.86147816 0.84845091 0.85529115 0.85715752 0.84032333 0.83808418 0.84032333 0.85751185 0.84429992] mean value: 0.8493664950009298 key: test_jcc value: [0.65517241 0.51515152 0.54285714 0.5 0.625 0.63636364 0.67857143 0.75 0.66666667 0.53333333] mean value: 0.6103116136736826 key: train_jcc value: [0.73359073 0.75 0.72868217 0.74427481 0.74319066 0.71863118 0.71374046 0.71863118 0.74418605 0.71875 ] mean value: 0.7313677236713617 MCC on Blind test: 0.33 Accuracy on Blind test: 0.67 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02325892 0.02017713 0.02105737 0.02158213 0.02060938 0.02063775 0.0231297 0.02031779 0.02411246 0.02205038] mean value: 0.0216933012008667 key: score_time value: [0.0124054 0.0114367 0.01170635 0.01159406 0.01113749 0.01143193 0.01157951 0.01144934 0.01246667 0.0119288 ] mean value: 0.011713624000549316 key: test_mcc value: [0.80829038 0.65433031 0.84866842 0.84866842 0.80829038 0.84544958 0.76733527 0.92153846 0.76733527 0.68875274] mean value: 0.7958659239597995 key: train_mcc value: [0.79696947 0.81423213 0.7927817 0.7927817 0.79695053 0.79323288 0.79739862 0.78885906 0.80169098 0.81031311] mean value: 0.7985210167824943 key: test_accuracy value: [0.90384615 0.82692308 0.92307692 0.92307692 0.90384615 0.92156863 0.88235294 0.96078431 0.88235294 0.84313725] mean value: 0.8970965309200604 key: train_accuracy value: [0.89848812 0.90712743 0.89632829 0.89632829 0.89848812 0.89655172 0.8987069 0.89439655 0.90086207 0.90517241] mean value: 0.8992449914351679 key: test_fscore value: [0.90196078 0.83018868 0.92 0.92 0.90566038 0.91666667 0.88461538 0.96 0.88461538 0.84615385] mean value: 0.8969861122968781 key: train_fscore value: [0.89760349 0.9059081 0.89565217 0.89565217 0.89715536 0.8961039 0.89760349 0.89370933 0.89956332 0.90393013] mean value: 0.8982881450268425 key: test_precision value: [0.88461538 0.81481481 0.95833333 0.95833333 0.88888889 0.95652174 0.85185185 0.96 0.85185185 0.81481481] mean value: 0.8940026012634709 key: train_precision value: [0.89565217 0.90393013 0.88793103 0.88793103 0.89519651 0.88841202 0.89565217 0.88793103 0.89956332 0.90393013] mean value: 0.894612955577799 key: test_recall value: [0.92 0.84615385 0.88461538 0.88461538 0.92307692 0.88 0.92 0.96 0.92 0.88 ] mean value: 0.9018461538461539 key: train_recall value: [0.89956332 0.90789474 0.90350877 0.90350877 0.89912281 0.90393013 0.89956332 0.89956332 0.89956332 0.90393013] mean value: 0.9020148624837202 key: test_roc_auc value: [0.90444444 0.82692308 0.92307692 0.92307692 0.90384615 0.92076923 0.88307692 0.96076923 0.88307692 0.84384615] mean value: 0.8972905982905983 key: train_roc_auc value: [0.89849961 0.90713886 0.89643524 0.89643524 0.89849757 0.89664592 0.89871783 0.89446251 0.90084549 0.90515655] mean value: 0.8992834814328039 key: test_jcc value: [0.82142857 0.70967742 0.85185185 0.85185185 0.82758621 0.84615385 0.79310345 0.92307692 0.79310345 0.73333333] mean value: 0.8151166900499492 key: train_jcc value: [0.81422925 0.828 0.81102362 0.81102362 0.81349206 0.81176471 0.81422925 0.80784314 0.81746032 0.8247012 ] mean value: 0.8153767161426962 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [2.01321244 1.85925555 2.15802002 2.02537608 2.0681262 2.02303672 2.0381639 1.67555785 1.83844972 2.44499278] mean value: 2.01441912651062 key: score_time value: [0.01252031 0.01444769 0.01506567 0.01446342 0.01447082 0.01439071 0.01457739 0.01256704 0.01462984 0.0286572 ] mean value: 0.015579009056091308 key: test_mcc value: [0.69185185 0.69436507 0.65433031 0.81312325 0.81312325 0.88307692 0.84544958 0.88289781 0.76733527 0.68875274] mean value: 0.7734306059974984 key: train_mcc value: [0.99568893 0.99568893 1. 0.99568893 1. 0.98714723 1. 0.97003963 1. 0.99137787] mean value: 0.9935631530556671 key: test_accuracy value: [0.84615385 0.84615385 0.82692308 0.90384615 0.90384615 0.94117647 0.92156863 0.94117647 0.88235294 0.84313725] mean value: 0.8856334841628959 key: train_accuracy value: [0.99784017 0.99784017 1. 0.99784017 1. 0.99353448 1. 0.98491379 1. 0.99568966] mean value: 0.9967658449393014 key: test_fscore value: [0.84 0.84 0.82352941 0.89795918 0.90909091 0.94117647 0.91666667 0.93877551 0.88461538 0.84615385] mean value: 0.8837967382757299 key: train_fscore value: [0.99781182 0.99781182 1. 0.99781182 1. 0.99340659 1. 0.98454746 1. 0.99563319] mean value: 0.9967022691125853 key: test_precision value: [0.84 0.875 0.84 0.95652174 0.86206897 0.92307692 0.95652174 0.95833333 0.85185185 0.81481481] mean value: 0.8878189366855034 key: train_precision value: [1. 0.99563319 1. 0.99563319 1. 1. 1. 0.99553571 1. 0.99563319] mean value: 0.9982435277604491 key: test_recall value: [0.84 0.80769231 0.80769231 0.84615385 0.96153846 0.96 0.88 0.92 0.92 0.88 ] mean value: 0.8823076923076923 key: train_recall value: [0.99563319 1. 1. 1. 1. 0.98689956 1. 0.97379913 1. 0.99563319] mean value: 0.9951965065502183 key: test_roc_auc value: [0.84592593 0.84615385 0.82692308 0.90384615 0.90384615 0.94153846 0.92076923 0.94076923 0.88307692 0.84384615] mean value: 0.8856695156695157 key: train_roc_auc value: [0.99781659 0.99787234 1. 0.99787234 1. 0.99344978 1. 0.9847719 1. 0.99568893] mean value: 0.9967471894453219 key: test_jcc value: [0.72413793 0.72413793 0.7 0.81481481 0.83333333 0.88888889 0.84615385 0.88461538 0.79310345 0.73333333] mean value: 0.7942518911484429 key: train_jcc value: [0.99563319 0.99563319 1. 0.99563319 1. 0.98689956 1. 0.96956522 1. 0.99130435] mean value: 0.9934668691854945 MCC on Blind test: 0.75 Accuracy on Blind test: 0.86 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02841353 0.02183843 0.02099395 0.01935482 0.01939464 0.0214653 0.02150893 0.02194858 0.02061439 0.02167058] mean value: 0.021720314025878908 key: score_time value: [0.01223016 0.00936103 0.00875926 0.00877237 0.00869417 0.0087049 0.00894022 0.00874114 0.00898623 0.00875068] mean value: 0.009194016456604004 key: test_mcc value: [0.80829038 0.81312325 0.92307692 0.88527041 0.89056356 0.96153846 0.80431528 0.80461538 0.76461538 0.96148034] mean value: 0.8616889371976575 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.90384615 0.90384615 0.96153846 0.94230769 0.94230769 0.98039216 0.90196078 0.90196078 0.88235294 0.98039216] mean value: 0.9300904977375566 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90196078 0.89795918 0.96153846 0.94339623 0.94545455 0.98039216 0.89795918 0.90196078 0.88 0.97959184] mean value: 0.929021316297993 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88461538 0.95652174 0.96153846 0.92592593 0.89655172 0.96153846 0.91666667 0.88461538 0.88 1. ] mean value: 0.9267973748168651 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.92 0.84615385 0.96153846 0.96153846 1. 1. 0.88 0.92 0.88 0.96 ] mean value: 0.932923076923077 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90444444 0.90384615 0.96153846 0.94230769 0.94230769 0.98076923 0.90153846 0.90230769 0.88230769 0.98 ] mean value: 0.9301367521367522 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.82142857 0.81481481 0.92592593 0.89285714 0.89655172 0.96153846 0.81481481 0.82142857 0.78571429 0.96 ] mean value: 0.8695074312660519 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.87 Accuracy on Blind test: 0.93 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.11908174 0.11891198 0.12085533 0.12205338 0.12058616 0.12348747 0.12098384 0.12022972 0.11956501 0.12257099] mean value: 0.12083256244659424 key: score_time value: [0.01758242 0.01875472 0.01774883 0.01888084 0.01758361 0.0192914 0.01767397 0.0176754 0.01803231 0.0176661 ] mean value: 0.01808896064758301 key: test_mcc value: [0.7364532 0.69230769 0.88527041 0.77849894 0.88527041 0.84544958 0.80461538 0.88289781 0.72573276 0.65224812] mean value: 0.7888744323177563 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86538462 0.84615385 0.94230769 0.88461538 0.94230769 0.92156863 0.90196078 0.94117647 0.8627451 0.82352941] mean value: 0.8931749622926093 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.86792453 0.84615385 0.94117647 0.875 0.94339623 0.91666667 0.90196078 0.93877551 0.85714286 0.83018868] mean value: 0.8918385569031677 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.82142857 0.84615385 0.96 0.95454545 0.92592593 0.95652174 0.88461538 0.95833333 0.875 0.78571429] mean value: 0.8968238540847236 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.92 0.84615385 0.92307692 0.80769231 0.96153846 0.88 0.92 0.92 0.84 0.88 ] mean value: 0.8898461538461538 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86740741 0.84615385 0.94230769 0.88461538 0.94230769 0.92076923 0.90230769 0.94076923 0.86230769 0.82461538] mean value: 0.8933561253561253 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.76666667 0.73333333 0.88888889 0.77777778 0.89285714 0.84615385 0.82142857 0.88461538 0.75 0.70967742] mean value: 0.807139903107645 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01038766 0.01013541 0.01010156 0.01030636 0.01012969 0.01005673 0.0101459 0.0112083 0.01133013 0.01134515] mean value: 0.01051468849182129 key: score_time value: [0.00891232 0.00864291 0.00884056 0.00908256 0.008775 0.00878358 0.0087533 0.00932074 0.00876355 0.00923038] mean value: 0.008910489082336426 key: test_mcc value: [0.54074074 0.27104108 0.58080232 0.40422604 0.66628253 0.33282012 0.5685677 0.64769231 0.49076923 0.61017022] mean value: 0.5113112288991031 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.76923077 0.63461538 0.78846154 0.69230769 0.82692308 0.66666667 0.78431373 0.82352941 0.74509804 0.80392157] mean value: 0.7535067873303167 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.76923077 0.65454545 0.7755102 0.63636364 0.84210526 0.65306122 0.7755102 0.82352941 0.74509804 0.80769231] mean value: 0.7482646514623515 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.74074074 0.62068966 0.82608696 0.77777778 0.77419355 0.66666667 0.79166667 0.80769231 0.73076923 0.77777778] mean value: 0.7514061328172418 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.8 0.69230769 0.73076923 0.53846154 0.92307692 0.64 0.76 0.84 0.76 0.84 ] mean value: 0.7524615384615385 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.77037037 0.63461538 0.78846154 0.69230769 0.82692308 0.66615385 0.78384615 0.82384615 0.74538462 0.80461538] mean value: 0.7536524216524216 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.625 0.48648649 0.63333333 0.46666667 0.72727273 0.48484848 0.63333333 0.7 0.59375 0.67741935] mean value: 0.6028110386779741 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.57 Accuracy on Blind test: 0.79 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.78404307 1.84690738 1.85415792 1.84416199 1.86087155 1.84431767 1.88158655 1.84635663 1.76327705 1.80362248] mean value: 1.8329302310943603 key: score_time value: [0.09506845 0.09978175 0.10103059 0.09958267 0.0994401 0.10090494 0.10089231 0.09912086 0.09177494 0.10069799] mean value: 0.09882946014404297 key: test_mcc value: [0.84888889 0.84866842 0.96225045 0.92307692 0.9258201 1. 0.92427578 1. 0.88307692 0.88307692] mean value: 0.9199134409597195 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.92307692 0.92307692 0.98076923 0.96153846 0.96153846 1. 0.96078431 1. 0.94117647 0.94117647] mean value: 0.9593137254901961 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92307692 0.92 0.98039216 0.96153846 0.96296296 1. 0.95833333 1. 0.94117647 0.94117647] mean value: 0.9588656778950897 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88888889 0.95833333 1. 0.96153846 0.92857143 1. 1. 1. 0.92307692 0.92307692] mean value: 0.9583485958485959 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96 0.88461538 0.96153846 0.96153846 1. 1. 0.92 1. 0.96 0.96 ] mean value: 0.9607692307692308 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.92444444 0.92307692 0.98076923 0.96153846 0.96153846 1. 0.96 1. 0.94153846 0.94153846] mean value: 0.9594444444444444 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.85714286 0.85185185 0.96153846 0.92592593 0.92857143 1. 0.92 1. 0.88888889 0.88888889] mean value: 0.9222808302808303 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.81 Accuracy on Blind test: 0.9 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: fit_time value: [1.84273124 0.98533678 1.06106544 0.9535141 1.02706265 0.97413754 0.97359943 0.98847985 1.03109789 0.98040366] mean value: 1.0817428588867188 key: score_time value: [0.2416172 0.22115898 0.2508235 0.22502398 0.17808771 0.23984313 0.2546699 0.28216171 0.22378087 0.21789837] mean value: 0.23350653648376465 key: test_mcc value: [0.813662 0.76923077 0.96225045 0.92307692 0.88527041 1. 0.92427578 1. 0.88307692 0.88307692] mean value: 0.9043920181288048 key: train_mcc value: [0.95683011 0.95247872 0.95679358 0.96112065 0.95682367 0.94826721 0.95258977 0.9482967 0.95258977 0.95692011] mean value: 0.9542710305666754 key: test_accuracy value: [0.90384615 0.88461538 0.98076923 0.96153846 0.94230769 1. 0.96078431 1. 0.94117647 0.94117647] mean value: 0.9516214177978883 key: train_accuracy value: [0.97840173 0.9762419 0.97840173 0.98056156 0.97840173 0.97413793 0.9762931 0.97413793 0.9762931 0.97844828] mean value: 0.9771318984136441 key: test_fscore value: [0.90566038 0.88461538 0.98039216 0.96153846 0.94339623 1. 0.95833333 1. 0.94117647 0.94117647] mean value: 0.951628888129998 key: train_fscore value: [0.97807018 0.97582418 0.97807018 0.98021978 0.97797357 0.97379913 0.97603486 0.97368421 0.97603486 0.97807018] mean value: 0.9767781104581152 key: test_precision value: [0.85714286 0.88461538 1. 0.96153846 0.92592593 1. 1. 1. 0.92307692 0.92307692] mean value: 0.9475376475376476 key: train_precision value: [0.98237885 0.97797357 0.97807018 0.98237885 0.98230088 0.97379913 0.97391304 0.97797357 0.97391304 0.98237885] mean value: 0.9785079974428954 key: test_recall value: [0.96 0.88461538 0.96153846 0.96153846 0.96153846 1. 0.92 1. 0.96 0.96 ] mean value: 0.9569230769230769 key: train_recall value: [0.97379913 0.97368421 0.97807018 0.97807018 0.97368421 0.97379913 0.97816594 0.96943231 0.97816594 0.97379913] mean value: 0.9750670343982226 key: test_roc_auc value: [0.90592593 0.88461538 0.98076923 0.96153846 0.94230769 1. 0.96 1. 0.94153846 0.94153846] mean value: 0.9518233618233618 key: train_roc_auc value: [0.97835255 0.97620381 0.97839679 0.98052445 0.97833147 0.97413361 0.97631701 0.97407786 0.97631701 0.97838893] mean value: 0.9771043482593041 key: test_jcc value: [0.82758621 0.79310345 0.96153846 0.92592593 0.89285714 1. 0.92 1. 0.88888889 0.88888889] mean value: 0.9098788963271722 key: train_jcc value: [0.95708155 0.9527897 0.95708155 0.9612069 0.95689655 0.94893617 0.95319149 0.94871795 0.95319149 0.95708155] mean value: 0.954617488069393 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02028584 0.01054978 0.01048946 0.01108623 0.01054072 0.01161528 0.0117743 0.01121378 0.01106954 0.01049757] mean value: 0.011912250518798828 key: score_time value: [0.01542568 0.00919604 0.00922108 0.00998044 0.00957513 0.00999999 0.00992775 0.00970435 0.00996804 0.00929213] mean value: 0.010229063034057618 key: test_mcc value: [0.57831366 0.4233902 0.69230769 0.71151247 0.73568294 0.72573276 0.80461538 0.88307692 0.60769231 0.64715023] mean value: 0.6809474562124411 key: train_mcc value: [0.69835966 0.77538376 0.72791401 0.72780737 0.77538491 0.75470857 0.75496039 0.72841838 0.75426257 0.74133606] mean value: 0.7438535657906272 key: test_accuracy value: [0.78846154 0.71153846 0.84615385 0.84615385 0.86538462 0.8627451 0.90196078 0.94117647 0.80392157 0.82352941] mean value: 0.8391025641025641 key: train_accuracy value: [0.8488121 0.88768898 0.86393089 0.86393089 0.88768898 0.87715517 0.87715517 0.86422414 0.87715517 0.87068966] mean value: 0.8718431146197959 key: test_fscore value: [0.76595745 0.70588235 0.84615385 0.82608696 0.87272727 0.85714286 0.90196078 0.94117647 0.8 0.81632653] mean value: 0.8333414517809609 key: train_fscore value: [0.84304933 0.88495575 0.8627451 0.86092715 0.88646288 0.87741935 0.87794433 0.86153846 0.87527352 0.86899563] mean value: 0.8699311510042489 key: test_precision value: [0.81818182 0.72 0.84615385 0.95 0.82758621 0.875 0.88461538 0.92307692 0.8 0.83333333] mean value: 0.8477947512257857 key: train_precision value: [0.86635945 0.89285714 0.85714286 0.86666667 0.8826087 0.86440678 0.86134454 0.86725664 0.87719298 0.86899563] mean value: 0.8704831379611647 key: test_recall value: [0.72 0.69230769 0.84615385 0.73076923 0.92307692 0.84 0.92 0.96 0.8 0.8 ] mean value: 0.8232307692307692 key: train_recall value: [0.8209607 0.87719298 0.86842105 0.85526316 0.89035088 0.89082969 0.89519651 0.8558952 0.87336245 0.86899563] mean value: 0.8696468244847928 key: test_roc_auc value: [0.78592593 0.71153846 0.84615385 0.84615385 0.86538462 0.86230769 0.90230769 0.94153846 0.80384615 0.82307692] mean value: 0.8388233618233618 key: train_roc_auc value: [0.84851454 0.88753266 0.86399776 0.86380179 0.88772863 0.87732974 0.87738549 0.86411781 0.87710675 0.87066803] mean value: 0.8718183204075174 key: test_jcc value: [0.62068966 0.54545455 0.73333333 0.7037037 0.77419355 0.75 0.82142857 0.88888889 0.66666667 0.68965517] mean value: 0.7194014085449013 key: train_jcc value: [0.72868217 0.79365079 0.75862069 0.75581395 0.79607843 0.7816092 0.78244275 0.75675676 0.77821012 0.76833977] mean value: 0.7700204624031467 MCC on Blind test: 0.76 Accuracy on Blind test: 0.88 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.15760708 0.06621742 0.07752442 0.07321143 0.0758338 0.08350158 0.07533073 0.07601404 0.06880569 0.07532454] mean value: 0.08293707370758056 key: score_time value: [0.0113287 0.01082397 0.01109338 0.01086092 0.01102328 0.01106405 0.0109849 0.01106334 0.01134181 0.01326489] mean value: 0.011284923553466797 key: test_mcc value: [0.89087081 0.84866842 0.96225045 0.92307692 0.9258201 1. 0.92153846 1. 0.88307692 0.92427578] mean value: 0.9279577865544593 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94230769 0.92307692 0.98076923 0.96153846 0.96153846 1. 0.96078431 1. 0.94117647 0.96078431] mean value: 0.9631975867269985 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94339623 0.92 0.98039216 0.96153846 0.96296296 1. 0.96 1. 0.94117647 0.95833333] mean value: 0.9627799611700832 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.89285714 0.95833333 1. 0.96153846 0.92857143 1. 0.96 1. 0.92307692 1. ] mean value: 0.9624377289377289 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.88461538 0.96153846 0.96153846 1. 1. 0.96 1. 0.96 0.92 ] mean value: 0.9647692307692308 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94444444 0.92307692 0.98076923 0.96153846 0.96153846 1. 0.96076923 1. 0.94153846 0.96 ] mean value: 0.9633675213675214 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.89285714 0.85185185 0.96153846 0.92592593 0.92857143 1. 0.92307692 1. 0.88888889 0.92 ] mean value: 0.9292710622710623 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.86 Accuracy on Blind test: 0.93 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.05204272 0.08309412 0.08293486 0.09231877 0.06768751 0.06185079 0.0759449 0.04235697 0.07767081 0.07520199] mean value: 0.07111034393310547 key: score_time value: [0.01879072 0.03160286 0.01222348 0.02677679 0.01260066 0.01890087 0.01229119 0.01862192 0.02106357 0.01877522] mean value: 0.019164729118347167 key: test_mcc value: [0.77185185 0.61538462 0.84866842 0.77151675 0.84866842 0.88289781 0.64769231 0.80904133 0.73107432 0.6610182 ] mean value: 0.7587814027095846 key: train_mcc value: [0.91800556 0.90108249 0.91392793 0.91358716 0.91358716 0.90520077 0.90129433 0.9009374 0.91411317 0.91393374] mean value: 0.9095669708390538 key: test_accuracy value: [0.88461538 0.80769231 0.92307692 0.88461538 0.92307692 0.94117647 0.82352941 0.90196078 0.8627451 0.82352941] mean value: 0.8776018099547511 key: train_accuracy value: [0.95896328 0.95032397 0.95680346 0.95680346 0.95680346 0.95258621 0.95043103 0.95043103 0.95689655 0.95689655] mean value: 0.954693900350041 key: test_fscore value: [0.88461538 0.80769231 0.92 0.88 0.92592593 0.93877551 0.82352941 0.89361702 0.86792453 0.83636364] mean value: 0.8778443726144525 key: train_fscore value: [0.95878525 0.95032397 0.95670996 0.95614035 0.95614035 0.95217391 0.95053763 0.95010846 0.95689655 0.95670996] mean value: 0.954452639776014 key: test_precision value: [0.85185185 0.80769231 0.95833333 0.91666667 0.89285714 0.95833333 0.80769231 0.95454545 0.82142857 0.76666667] mean value: 0.8736067636067636 key: train_precision value: [0.95258621 0.93617021 0.94444444 0.95614035 0.95614035 0.94805195 0.93644068 0.94396552 0.94468085 0.94849785] mean value: 0.9467118414261851 key: test_recall value: [0.92 0.80769231 0.88461538 0.84615385 0.96153846 0.92 0.84 0.84 0.92 0.92 ] mean value: 0.886 key: train_recall value: [0.9650655 0.96491228 0.96929825 0.95614035 0.95614035 0.95633188 0.9650655 0.95633188 0.96943231 0.9650655 ] mean value: 0.962378380448939 key: test_roc_auc value: [0.88592593 0.80769231 0.92307692 0.88461538 0.92307692 0.94076923 0.82384615 0.90076923 0.86384615 0.82538462] mean value: 0.8779002849002849 key: train_roc_auc value: [0.95902848 0.95054125 0.95698955 0.95679358 0.95679358 0.95263402 0.95061786 0.95050636 0.95705658 0.95700084] mean value: 0.9547962096825529 key: test_jcc value: [0.79310345 0.67741935 0.85185185 0.78571429 0.86206897 0.88461538 0.7 0.80769231 0.76666667 0.71875 ] mean value: 0.784788226517231 key: train_jcc value: [0.92083333 0.90534979 0.91701245 0.91596639 0.91596639 0.90871369 0.9057377 0.90495868 0.91735537 0.91701245] mean value: 0.9128906244397688 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01436949 0.01111484 0.01016617 0.01088428 0.00985456 0.00989771 0.01070356 0.01117134 0.01106 0.01109123] mean value: 0.011031317710876464 key: score_time value: [0.01223636 0.00920248 0.00911593 0.00875282 0.00875449 0.00877905 0.00953007 0.00965476 0.00958037 0.00875211] mean value: 0.00943584442138672 key: test_mcc value: [0.65330526 0.54006172 0.84866842 0.70064905 0.80829038 0.72573276 0.72573276 0.72615385 0.68875274 0.608971 ] mean value: 0.7026317935253169 key: train_mcc value: [0.67238923 0.72423761 0.71496629 0.70646532 0.72361387 0.74138866 0.69411122 0.70713779 0.70314599 0.71561406] mean value: 0.7103070036060428 key: test_accuracy value: [0.82692308 0.76923077 0.92307692 0.84615385 0.90384615 0.8627451 0.8627451 0.8627451 0.84313725 0.80392157] mean value: 0.8504524886877828 key: train_accuracy value: [0.83585313 0.86177106 0.8574514 0.85313175 0.86177106 0.87068966 0.84698276 0.85344828 0.8512931 0.85775862] mean value: 0.8550150815520965 key: test_fscore value: [0.81632653 0.76 0.92 0.83333333 0.90196078 0.85714286 0.85714286 0.8627451 0.84615385 0.79166667] mean value: 0.8446471973404747 key: train_fscore value: [0.82959641 0.85585586 0.85333333 0.84821429 0.85777778 0.86784141 0.84257206 0.84888889 0.84563758 0.8539823 ] mean value: 0.8503699910679656 key: test_precision value: [0.83333333 0.79166667 0.95833333 0.90909091 0.92 0.875 0.875 0.84615385 0.81481481 0.82608696] mean value: 0.8649479859914643 key: train_precision value: [0.85253456 0.87962963 0.86486486 0.86363636 0.86936937 0.87555556 0.85585586 0.86425339 0.86697248 0.86547085] mean value: 0.8658142923870936 key: test_recall value: [0.8 0.73076923 0.88461538 0.76923077 0.88461538 0.84 0.84 0.88 0.88 0.76 ] mean value: 0.8269230769230769 key: train_recall value: [0.80786026 0.83333333 0.84210526 0.83333333 0.84649123 0.86026201 0.82969432 0.83406114 0.82532751 0.84279476] mean value: 0.8355263157894737 key: test_roc_auc value: [0.82592593 0.76923077 0.92307692 0.84615385 0.90384615 0.86230769 0.86230769 0.86307692 0.84384615 0.80307692] mean value: 0.8502849002849002 key: train_roc_auc value: [0.83555406 0.86134752 0.85722284 0.85283688 0.86154349 0.87055654 0.84676206 0.85320078 0.85096163 0.85756759] mean value: 0.8547553382911727 key: test_jcc value: [0.68965517 0.61290323 0.85185185 0.71428571 0.82142857 0.75 0.75 0.75862069 0.73333333 0.65517241] mean value: 0.7337250972567991 key: train_jcc value: [0.70881226 0.7480315 0.74418605 0.73643411 0.75097276 0.76653696 0.72796935 0.73745174 0.73255814 0.74517375] mean value: 0.7398126610083979 MCC on Blind test: 0.76 Accuracy on Blind test: 0.88 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01407576 0.01649785 0.02283406 0.02057886 0.01880026 0.02140307 0.02072906 0.02038002 0.02303767 0.01901793] mean value: 0.01973545551300049 key: score_time value: [0.01000118 0.01126051 0.01198721 0.01775503 0.01397157 0.01187444 0.01192546 0.01181602 0.01192856 0.0117836 ] mean value: 0.012430357933044433 key: test_mcc value: [0.67524617 0.65433031 0.81312325 0.88527041 0.73131034 0.76662339 0.76733527 0.73878883 0.74071542 0.65224812] mean value: 0.7424991517164395 key: train_mcc value: [0.76102063 0.87912177 0.90936066 0.85585682 0.85829967 0.89258812 0.86828293 0.81782174 0.8647866 0.8793363 ] mean value: 0.8586475244728068 key: test_accuracy value: [0.82692308 0.82692308 0.90384615 0.94230769 0.86538462 0.88235294 0.88235294 0.8627451 0.8627451 0.82352941] mean value: 0.8679110105580694 key: train_accuracy value: [0.86825054 0.93952484 0.95464363 0.92656587 0.92656587 0.94612069 0.93318966 0.90517241 0.93103448 0.93965517] mean value: 0.9270723169732629 key: test_fscore value: [0.79069767 0.82352941 0.89795918 0.94339623 0.8627451 0.875 0.88461538 0.84444444 0.87272727 0.83018868] mean value: 0.8625303375343475 key: train_fscore value: [0.84711779 0.9380531 0.95424837 0.92827004 0.92093023 0.94456763 0.93446089 0.89671362 0.93277311 0.93913043] mean value: 0.923626520709015 key: test_precision value: [0.94444444 0.84 0.95652174 0.92592593 0.88 0.91304348 0.85185185 0.95 0.8 0.78571429] mean value: 0.8847501725327812 key: train_precision value: [0.99411765 0.94642857 0.94805195 0.89430894 0.98019802 0.95945946 0.9057377 0.96954315 0.89878543 0.93506494] mean value: 0.9431695801182518 key: test_recall value: [0.68 0.80769231 0.84615385 0.96153846 0.84615385 0.84 0.92 0.76 0.96 0.88 ] mean value: 0.8501538461538461 key: train_recall value: [0.73799127 0.92982456 0.96052632 0.96491228 0.86842105 0.930131 0.9650655 0.83406114 0.96943231 0.94323144] mean value: 0.9103596874281774 key: test_roc_auc value: [0.82148148 0.82692308 0.90384615 0.94230769 0.86538462 0.88153846 0.88307692 0.86076923 0.86461538 0.82461538] mean value: 0.8674558404558405 key: train_roc_auc value: [0.86685888 0.93938037 0.95473124 0.92713699 0.92569989 0.94591657 0.93359658 0.90426461 0.93152467 0.93970083] mean value: 0.9268810621174348 key: test_jcc value: [0.65384615 0.7 0.81481481 0.89285714 0.75862069 0.77777778 0.79310345 0.73076923 0.77419355 0.70967742] mean value: 0.760566022573809 key: train_jcc value: [0.73478261 0.88333333 0.9125 0.86614173 0.85344828 0.89495798 0.87698413 0.81276596 0.87401575 0.8852459 ] mean value: 0.8594175667469572 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01722074 0.02114892 0.02169371 0.01978874 0.02115488 0.02141595 0.01782823 0.02157593 0.02269053 0.02523351] mean value: 0.020975112915039062 key: score_time value: [0.01061797 0.01195741 0.01185656 0.01192594 0.01196694 0.01201153 0.01211405 0.01274276 0.01431131 0.01298666] mean value: 0.012249112129211426 key: test_mcc value: [0.81203628 0.66628253 0.72760688 0.88527041 0.72760688 0.84544958 0.84307692 0.84307692 0.88289781 0.77487835] mean value: 0.8008182566592692 key: train_mcc value: [0.84181709 0.87136001 0.8004481 0.90499207 0.71142522 0.89664473 0.87722401 0.89655622 0.90180046 0.87805565] mean value: 0.8580323560247036 key: test_accuracy value: [0.90384615 0.82692308 0.84615385 0.94230769 0.84615385 0.92156863 0.92156863 0.92156863 0.94117647 0.88235294] mean value: 0.8953619909502262 key: train_accuracy value: [0.91792657 0.93304536 0.89416847 0.9524838 0.83801296 0.94827586 0.9375 0.94827586 0.95043103 0.9375 ] mean value: 0.9257619907648768 key: test_fscore value: [0.89361702 0.84210526 0.81818182 0.94117647 0.86666667 0.91666667 0.92 0.92 0.93877551 0.88888889] mean value: 0.8946078305630848 key: train_fscore value: [0.91162791 0.93555094 0.88192771 0.95196507 0.85768501 0.94713656 0.93424036 0.94736842 0.94854586 0.93920335] mean value: 0.9255251191697211 key: test_precision value: [0.95454545 0.77419355 1. 0.96 0.76470588 0.95652174 0.92 0.92 0.95833333 0.82758621] mean value: 0.9035886164645812 key: train_precision value: [0.97512438 0.88932806 0.97860963 0.94782609 0.75585284 0.95555556 0.97169811 0.95154185 0.97247706 0.90322581] mean value: 0.9301239386440059 key: test_recall value: [0.84 0.92307692 0.69230769 0.92307692 1. 0.88 0.92 0.92 0.92 0.96 ] mean value: 0.8978461538461538 key: train_recall value: [0.8558952 0.98684211 0.80263158 0.95614035 0.99122807 0.93886463 0.89956332 0.94323144 0.92576419 0.97816594] mean value: 0.9278326821420363 key: test_roc_auc value: [0.90148148 0.82692308 0.84615385 0.94230769 0.84615385 0.92076923 0.92153846 0.92153846 0.94076923 0.88384615] mean value: 0.8951481481481481 key: train_roc_auc value: [0.91726384 0.93384658 0.89280515 0.95253826 0.84029489 0.94815572 0.9370157 0.94821147 0.95011614 0.93801914] mean value: 0.9258266884068974 key: test_jcc value: [0.80769231 0.72727273 0.69230769 0.88888889 0.76470588 0.84615385 0.85185185 0.85185185 0.88461538 0.8 ] mean value: 0.8115340432987492 key: train_jcc value: [0.83760684 0.87890625 0.7887931 0.90833333 0.75083056 0.89958159 0.87659574 0.9 0.90212766 0.88537549] mean value: 0.8628150577457124 MCC on Blind test: 0.81 Accuracy on Blind test: 0.9 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.1831696 0.17794561 0.17552376 0.17649794 0.17604017 0.18365979 0.17523932 0.17550063 0.17589378 0.1765306 ] mean value: 0.17760012149810792 key: score_time value: [0.01525378 0.01530957 0.01576948 0.0153296 0.01532698 0.01565194 0.01529312 0.01551795 0.01531744 0.01527667] mean value: 0.015404653549194337 key: test_mcc value: [0.92592593 0.92307692 0.9258201 0.92307692 0.9258201 1. 0.88289781 0.96153846 0.84307692 0.92427578] mean value: 0.9235508946829519 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96153846 0.96153846 0.96153846 0.96153846 0.96153846 1. 0.94117647 0.98039216 0.92156863 0.96078431] mean value: 0.9611613876319759 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96153846 0.96153846 0.96 0.96153846 0.96296296 1. 0.93877551 0.98039216 0.92 0.95833333] mean value: 0.9605079347978508 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.92592593 0.96153846 1. 0.96153846 0.92857143 1. 0.95833333 0.96153846 0.92 1. ] mean value: 0.9617446072446073 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96153846 0.92307692 0.96153846 1. 1. 0.92 1. 0.92 0.92 ] mean value: 0.9606153846153846 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96296296 0.96153846 0.96153846 0.96153846 0.96153846 1. 0.94076923 0.98076923 0.92153846 0.96 ] mean value: 0.9612193732193732 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.92592593 0.92592593 0.92307692 0.92592593 0.92857143 1. 0.88461538 0.96153846 0.85185185 0.92 ] mean value: 0.9247431827431828 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.95 Accuracy on Blind test: 0.98 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.06544733 0.06211853 0.06923485 0.07710958 0.07985926 0.07961392 0.05945921 0.07828259 0.06221557 0.06299257] mean value: 0.0696333408355713 key: score_time value: [0.02117729 0.03006291 0.02708936 0.02777982 0.02915573 0.03034329 0.02469993 0.02470422 0.03304958 0.02316904] mean value: 0.027123117446899415 key: test_mcc value: [0.89087081 0.92307692 0.96225045 0.92307692 0.9258201 1. 0.80431528 0.96153846 0.88307692 0.92153846] mean value: 0.9195564331630951 key: train_mcc value: [0.98275766 0.98275637 0.99135872 0.99568837 0.99135872 0.98275574 0.9870767 0.99569843 0.98290567 0.9870767 ] mean value: 0.9879433054726061 key: test_accuracy value: [0.94230769 0.96153846 0.98076923 0.96153846 0.96153846 1. 0.90196078 0.98039216 0.94117647 0.96078431] mean value: 0.9592006033182504 key: train_accuracy value: [0.99136069 0.99136069 0.99568035 0.99784017 0.99568035 0.99137931 0.99353448 0.99784483 0.99137931 0.99353448] mean value: 0.9939594660013406 key: test_fscore value: [0.94339623 0.96153846 0.98039216 0.96153846 0.96296296 1. 0.89795918 0.98039216 0.94117647 0.96 ] mean value: 0.9589356080442175 key: train_fscore value: [0.99130435 0.99126638 0.99561404 0.9978022 0.99561404 0.99126638 0.99346405 0.99782135 0.99134199 0.99346405] mean value: 0.9938958813575108 key: test_precision value: [0.89285714 0.96153846 1. 0.96153846 0.92857143 1. 0.91666667 0.96153846 0.92307692 0.96 ] mean value: 0.9505787545787546 key: train_precision value: [0.98701299 0.98695652 0.99561404 1. 0.99561404 0.99126638 0.99130435 0.99565217 0.98283262 0.99130435] mean value: 0.9917557442064376 key: test_recall value: [1. 0.96153846 0.96153846 0.96153846 1. 1. 0.88 1. 0.96 0.96 ] mean value: 0.9684615384615385 key: train_recall value: [0.99563319 0.99561404 0.99561404 0.99561404 0.99561404 0.99126638 0.99563319 1. 1. 0.99563319] mean value: 0.9960622079215506 key: test_roc_auc value: [0.94444444 0.96153846 0.98076923 0.96153846 0.96153846 1. 0.90153846 0.98076923 0.94153846 0.96076923] mean value: 0.9594444444444444 key: train_roc_auc value: [0.99140634 0.99142404 0.99567936 0.99780702 0.99567936 0.99137787 0.99356127 0.99787234 0.99148936 0.99356127] mean value: 0.9939858230006007 key: test_jcc value: [0.89285714 0.92592593 0.96153846 0.92592593 0.92857143 1. 0.81481481 0.96153846 0.88888889 0.92307692] mean value: 0.9223137973137974 key: train_jcc value: [0.98275862 0.98268398 0.99126638 0.99561404 0.99126638 0.98268398 0.98701299 0.99565217 0.98283262 0.98701299] mean value: 0.9878784138201812 MCC on Blind test: 0.9 Accuracy on Blind test: 0.95 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.09743285 0.13634515 0.15849376 0.17319727 0.15302992 0.16544819 0.15362334 0.14878941 0.14881253 0.15468121] mean value: 0.14898536205291749 key: score_time value: [0.0148344 0.01481652 0.02447915 0.02398372 0.02386975 0.02401042 0.02402806 0.02415848 0.02408814 0.024122 ] mean value: 0.022239065170288085 key: test_mcc value: [0.65330526 0.53846154 0.6172134 0.466924 0.62279916 0.68875274 0.68615385 0.84307692 0.68615385 0.61017022] mean value: 0.6413010922501776 key: train_mcc value: [0.98712064 0.99568837 0.98711849 0.98711849 0.98711849 0.98714723 0.99141377 0.98714723 0.98714723 0.98714723] mean value: 0.9884167178261011 key: test_accuracy value: [0.82692308 0.76923077 0.80769231 0.71153846 0.80769231 0.84313725 0.84313725 0.92156863 0.84313725 0.80392157] mean value: 0.8177978883861237 key: train_accuracy value: [0.99352052 0.99784017 0.99352052 0.99352052 0.99352052 0.99353448 0.99568966 0.99353448 0.99353448 0.99353448] mean value: 0.9941749832427199 key: test_fscore value: [0.81632653 0.76923077 0.8 0.63414634 0.82142857 0.84615385 0.84 0.92 0.84 0.80769231] mean value: 0.8094978366581154 key: train_fscore value: [0.99340659 0.9978022 0.99337748 0.99337748 0.99337748 0.99340659 0.99561404 0.99340659 0.99340659 0.99340659] mean value: 0.994058165025401 key: test_precision value: [0.83333333 0.76923077 0.83333333 0.86666667 0.76666667 0.81481481 0.84 0.92 0.84 0.77777778] mean value: 0.8261823361823362 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.8 0.76923077 0.76923077 0.5 0.88461538 0.88 0.84 0.92 0.84 0.84 ] mean value: 0.8043076923076923 key: train_recall value: [0.98689956 0.99561404 0.98684211 0.98684211 0.98684211 0.98689956 0.99126638 0.98689956 0.98689956 0.98689956] mean value: 0.988190454301693 key: test_roc_auc value: [0.82592593 0.76923077 0.80769231 0.71153846 0.80769231 0.84384615 0.84307692 0.92153846 0.84307692 0.80461538] mean value: 0.8178233618233618 key: train_roc_auc value: [0.99344978 0.99780702 0.99342105 0.99342105 0.99342105 0.99344978 0.99563319 0.99344978 0.99344978 0.99344978] mean value: 0.9940952271508465 key: test_jcc value: [0.68965517 0.625 0.66666667 0.46428571 0.6969697 0.73333333 0.72413793 0.85185185 0.72413793 0.67741935] mean value: 0.6853457652428732 key: train_jcc value: [0.98689956 0.99561404 0.98684211 0.98684211 0.98684211 0.98689956 0.99126638 0.98689956 0.98689956 0.98689956] mean value: 0.988190454301693 MCC on Blind test: 0.58 Accuracy on Blind test: 0.79 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.71794391 0.7203269 0.72270894 0.72166705 0.71613026 0.71105385 0.71652889 0.72433043 0.72593617 0.72219372] mean value: 0.7198820114135742 key: score_time value: [0.00964975 0.00945163 0.00943828 0.00946069 0.00968385 0.00951886 0.00939965 0.00939608 0.00974894 0.00926399] mean value: 0.009501171112060548 key: test_mcc value: [0.89087081 0.88527041 0.96225045 0.92307692 0.9258201 1. 0.92153846 0.96153846 0.8459178 0.92153846] mean value: 0.9237821874489884 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94230769 0.94230769 0.98076923 0.96153846 0.96153846 1. 0.96078431 0.98039216 0.92156863 0.96078431] mean value: 0.9611990950226245 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94339623 0.94117647 0.98039216 0.96153846 0.96296296 1. 0.96 0.98039216 0.92307692 0.96 ] mean value: 0.9612935358307167 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.89285714 0.96 1. 0.96153846 0.92857143 1. 0.96 0.96153846 0.88888889 0.96 ] mean value: 0.9513394383394383 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.92307692 0.96153846 0.96153846 1. 1. 0.96 1. 0.96 0.96 ] mean value: 0.9726153846153847 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94444444 0.94230769 0.98076923 0.96153846 0.96153846 1. 0.96076923 0.98076923 0.92230769 0.96076923] mean value: 0.9615213675213675 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.89285714 0.88888889 0.96153846 0.92592593 0.92857143 1. 0.92307692 0.96153846 0.85714286 0.92307692] mean value: 0.9262617012617013 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.86 Accuracy on Blind test: 0.93 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03045988 0.03296375 0.04668927 0.0325253 0.03056479 0.03066826 0.03425074 0.03044724 0.03053427 0.03095007] mean value: 0.033005356788635254 key: score_time value: [0.01308393 0.01344943 0.01736689 0.01323104 0.01809931 0.01370311 0.01521468 0.01508665 0.01611853 0.01422572] mean value: 0.014957928657531738 key: test_mcc value: [0.4637037 0.32338083 0.09128709 0.28697202 0.50037023 0.25161197 0.42192651 0.43108293 0.54660922 0.31510143] mean value: 0.3632045934004423 key: train_mcc value: [0.87029251 0.82908577 0.52028331 0.6055719 0.97411589 0.83676363 0.92126558 0.96566269 0.94140567 0.59943068] mean value: 0.8063877621767472 key: test_accuracy value: [0.73076923 0.65384615 0.53846154 0.63461538 0.75 0.60784314 0.70588235 0.70588235 0.76470588 0.60784314] mean value: 0.6699849170437405 key: train_accuracy value: [0.93088553 0.90712743 0.71058315 0.76673866 0.98704104 0.91163793 0.95905172 0.98275862 0.96982759 0.76293103] mean value: 0.8888582706486929 key: test_fscore value: [0.73076923 0.7 0.63636364 0.68852459 0.75471698 0.67741935 0.72727273 0.73684211 0.78571429 0.70588235] mean value: 0.7143505264458934 key: train_fscore value: [0.93469388 0.91382766 0.77288136 0.80851064 0.98689956 0.91783567 0.96016771 0.98268398 0.97033898 0.80633803] mean value: 0.905417747054172 key: test_precision value: [0.7037037 0.61764706 0.525 0.6 0.74074074 0.56756757 0.66666667 0.65625 0.70967742 0.55813953] mean value: 0.6345392691740768 key: train_precision value: [0.87739464 0.84132841 0.62983425 0.67857143 0.9826087 0.84814815 0.9233871 0.97424893 0.94238683 0.67551622] mean value: 0.8373424655092186 key: test_recall value: [0.76 0.80769231 0.80769231 0.80769231 0.76923077 0.84 0.8 0.84 0.88 0.96 ] mean value: 0.8272307692307692 key: train_recall value: [1. 1. 1. 1. 0.99122807 1. 1. 0.99126638 1. 1. ] mean value: 0.998249444572129 key: test_roc_auc value: [0.73185185 0.65384615 0.53846154 0.63461538 0.75 0.61230769 0.70769231 0.70846154 0.76692308 0.61461538] mean value: 0.6718774928774929 key: train_roc_auc value: [0.93162393 0.90851064 0.71489362 0.77021277 0.9871034 0.91276596 0.95957447 0.98286723 0.97021277 0.76595745] mean value: 0.8903722218314364 key: test_jcc value: [0.57575758 0.53846154 0.46666667 0.525 0.60606061 0.51219512 0.57142857 0.58333333 0.64705882 0.54545455] mean value: 0.5571416782643468 key: train_jcc value: [0.87739464 0.84132841 0.62983425 0.67857143 0.97413793 0.84814815 0.9233871 0.96595745 0.94238683 0.67551622] mean value: 0.8356662410244379 MCC on Blind test: 0.45 Accuracy on Blind test: 0.69 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02886796 0.03506136 0.04856014 0.03749013 0.03853464 0.03879905 0.03333855 0.03817081 0.0379436 0.03178 ] mean value: 0.03685462474822998 key: score_time value: [0.0193758 0.01948166 0.02359152 0.02530694 0.02442861 0.02386785 0.01905274 0.02539372 0.02083659 0.01895976] mean value: 0.022029519081115723 key: test_mcc value: [0.80829038 0.65433031 0.84866842 0.88527041 0.80829038 0.88289781 0.76733527 0.88289781 0.80990051 0.73107432] mean value: 0.8078955626786625 key: train_mcc value: [0.86208312 0.86630587 0.85815088 0.84902492 0.84879533 0.85375825 0.85375825 0.86645175 0.85797371 0.84920893] mean value: 0.856551100724223 key: test_accuracy value: [0.90384615 0.82692308 0.92307692 0.94230769 0.90384615 0.94117647 0.88235294 0.94117647 0.90196078 0.8627451 ] mean value: 0.9029411764705882 key: train_accuracy value: [0.93088553 0.93304536 0.9287257 0.92440605 0.92440605 0.92672414 0.92672414 0.93318966 0.92887931 0.92456897] mean value: 0.9281554889401952 key: test_fscore value: [0.90196078 0.83018868 0.92 0.94117647 0.90566038 0.93877551 0.88461538 0.93877551 0.90566038 0.86792453] mean value: 0.9034737622189659 key: train_fscore value: [0.93103448 0.93275488 0.92903226 0.92407809 0.92341357 0.92672414 0.92672414 0.93275488 0.9287257 0.92407809] mean value: 0.9279320228969524 key: test_precision value: [0.88461538 0.81481481 0.95833333 0.96 0.88888889 0.95833333 0.85185185 0.95833333 0.85714286 0.82142857] mean value: 0.8953742368742369 key: train_precision value: [0.91914894 0.92274678 0.91139241 0.91416309 0.92139738 0.91489362 0.91489362 0.92672414 0.91880342 0.91810345] mean value: 0.9182266831443672 key: test_recall value: [0.92 0.84615385 0.88461538 0.92307692 0.92307692 0.92 0.92 0.92 0.96 0.92 ] mean value: 0.9136923076923077 key: train_recall value: [0.94323144 0.94298246 0.94736842 0.93421053 0.9254386 0.93886463 0.93886463 0.93886463 0.93886463 0.930131 ] mean value: 0.9378820960698689 key: test_roc_auc value: [0.90444444 0.82692308 0.92307692 0.94230769 0.90384615 0.94076923 0.88307692 0.94076923 0.90307692 0.86384615] mean value: 0.9032136752136752 key: train_roc_auc value: [0.93101743 0.93319336 0.92900336 0.92455207 0.92442143 0.92687912 0.92687912 0.9332621 0.92900678 0.92463997] mean value: 0.9282854742942543 key: test_jcc value: [0.82142857 0.70967742 0.85185185 0.88888889 0.82758621 0.88461538 0.79310345 0.88461538 0.82758621 0.76666667] mean value: 0.8256020029490552 key: train_jcc value: [0.87096774 0.87398374 0.86746988 0.85887097 0.85772358 0.86345382 0.86345382 0.87398374 0.86693548 0.85887097] mean value: 0.8655713728241052 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.19555092 0.28002501 0.27410674 0.29807663 0.32655668 0.30876517 0.29639506 0.28637075 0.28587222 0.27678657] mean value: 0.28285057544708253 key: score_time value: [0.01900291 0.01891041 0.01891088 0.02085233 0.0235498 0.01884913 0.01897097 0.02552199 0.02648234 0.02427053] mean value: 0.021532130241394044 key: test_mcc value: [0.80829038 0.65433031 0.84866842 0.88527041 0.80829038 0.88289781 0.76733527 0.88289781 0.80990051 0.73107432] mean value: 0.8078955626786625 key: train_mcc value: [0.86208312 0.86630587 0.80159752 0.84902492 0.84879533 0.85375825 0.85375825 0.86645175 0.90108236 0.84920893] mean value: 0.8552066307077918 key: test_accuracy value: [0.90384615 0.82692308 0.92307692 0.94230769 0.90384615 0.94117647 0.88235294 0.94117647 0.90196078 0.8627451 ] mean value: 0.9029411764705882 key: train_accuracy value: [0.93088553 0.93304536 0.90064795 0.92440605 0.92440605 0.92672414 0.92672414 0.93318966 0.95043103 0.92456897] mean value: 0.9275028859760185 key: test_fscore value: [0.90196078 0.83018868 0.92 0.94117647 0.90566038 0.93877551 0.88461538 0.93877551 0.90566038 0.86792453] mean value: 0.9034737622189659 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:107: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:110: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.93103448 0.93275488 0.9004329 0.92407809 0.92341357 0.92672414 0.92672414 0.93275488 0.95032397 0.92407809] mean value: 0.9272319143476138 key: test_precision value: [0.88461538 0.81481481 0.95833333 0.96 0.88888889 0.95833333 0.85185185 0.95833333 0.85714286 0.82142857] mean value: 0.8953742368742369 key: train_precision value: [0.91914894 0.92274678 0.88888889 0.91416309 0.92139738 0.91489362 0.91489362 0.92672414 0.94017094 0.91810345] mean value: 0.918113083663679 key: test_recall value: [0.92 0.84615385 0.88461538 0.92307692 0.92307692 0.92 0.92 0.92 0.96 0.92 ] mean value: 0.9136923076923077 key: train_recall value: [0.94323144 0.94298246 0.9122807 0.93421053 0.9254386 0.93886463 0.93886463 0.93886463 0.96069869 0.930131 ] mean value: 0.9365567302535815 key: test_roc_auc value: [0.90444444 0.82692308 0.92307692 0.94230769 0.90384615 0.94076923 0.88307692 0.94076923 0.90307692 0.86384615] mean value: 0.9032136752136752 key: train_roc_auc value: [0.93101743 0.93319336 0.9008212 0.92455207 0.92442143 0.92687912 0.92687912 0.9332621 0.95056211 0.92463997] mean value: 0.9276227913861107 key: test_jcc value: [0.82142857 0.70967742 0.85185185 0.88888889 0.82758621 0.88461538 0.79310345 0.88461538 0.82758621 0.76666667] mean value: 0.8256020029490552 key: train_jcc value: [0.87096774 0.87398374 0.81889764 0.85887097 0.85772358 0.86345382 0.86345382 0.87398374 0.90534979 0.85887097] mean value: 0.8645555796885971 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03841734 0.03949714 0.03965116 0.03697515 0.03645062 0.03646588 0.03761983 0.09846282 0.04134083 0.04195118] mean value: 0.044683194160461424 key: score_time value: [0.0146718 0.01451349 0.01439214 0.0156827 0.01446724 0.01457667 0.01458859 0.01217008 0.01210904 0.01474524] mean value: 0.014191699028015137 key: test_mcc value: [0.85164138 0.73997003 0.77849894 0.96225045 0.89056356 0.74466871 0.76923077 0.88527041 0.81312325 0.79056942] mean value: 0.8225786910541291 key: train_mcc value: [0.85946342 0.88089135 0.87262489 0.86395495 0.86815585 0.87246682 0.88113831 0.86411148 0.86411148 0.87660368] mean value: 0.8703522241153047 key: test_accuracy value: [0.9245283 0.86792453 0.88461538 0.98076923 0.94230769 0.86538462 0.88461538 0.94230769 0.90384615 0.88461538] mean value: 0.9080914368650218 key: train_accuracy value: [0.92963753 0.94029851 0.93617021 0.93191489 0.93404255 0.93617021 0.94042553 0.93191489 0.93191489 0.93829787] mean value: 0.9350787097944926 key: test_fscore value: [0.92592593 0.87719298 0.875 0.98113208 0.94545455 0.85106383 0.88461538 0.94117647 0.90909091 0.89655172] mean value: 0.9087203847528004 key: train_fscore value: [0.93052632 0.94092827 0.93697479 0.93248945 0.93446089 0.93670886 0.94117647 0.93277311 0.93277311 0.93842887] mean value: 0.9357240139743418 key: test_precision value: [0.89285714 0.83333333 0.95454545 0.96296296 0.89655172 0.95238095 0.88461538 0.96 0.86206897 0.8125 ] mean value: 0.9011815920350403 key: train_precision value: [0.92083333 0.92916667 0.9253112 0.92468619 0.92857143 0.92887029 0.92946058 0.92116183 0.92116183 0.93644068] mean value: 0.9265664027577826 key: test_recall value: [0.96153846 0.92592593 0.80769231 1. 1. 0.76923077 0.88461538 0.92307692 0.96153846 1. ] mean value: 0.9233618233618234 key: train_recall value: [0.94042553 0.95299145 0.94893617 0.94042553 0.94042553 0.94468085 0.95319149 0.94468085 0.94468085 0.94042553] mean value: 0.9450863793416985 key: test_roc_auc value: [0.92521368 0.86680912 0.88461538 0.98076923 0.94230769 0.86538462 0.88461538 0.94230769 0.90384615 0.88461538] mean value: 0.9080484330484331 key: train_roc_auc value: [0.92961448 0.94032551 0.93617021 0.93191489 0.93404255 0.93617021 0.94042553 0.93191489 0.93191489 0.93829787] mean value: 0.9350791052918713 key: test_jcc value: [0.86206897 0.78125 0.77777778 0.96296296 0.89655172 0.74074074 0.79310345 0.88888889 0.83333333 0.8125 ] mean value: 0.8349177841634738 key: train_jcc value: [0.87007874 0.88844622 0.88142292 0.87351779 0.87698413 0.88095238 0.88888889 0.87401575 0.87401575 0.884 ] mean value: 0.8792322559647762 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.88060451 1.01606941 0.91109967 1.10975718 0.93197393 0.99338555 0.97769165 0.90921044 0.9921577 0.92200208] mean value: 0.9643952131271363 key: score_time value: [0.01480174 0.04356098 0.01479554 0.01655483 0.02265716 0.04054356 0.01495075 0.0150485 0.01497436 0.01514244] mean value: 0.02130298614501953 key: test_mcc value: [0.81196581 0.8116984 0.77849894 0.96225045 0.89056356 0.81312325 0.84615385 0.84866842 0.84866842 0.82305489] mean value: 0.8434645996919574 key: train_mcc value: [0.91474349 0.91484796 0.90667855 0.90220118 0.91064654 0.90233192 0.90667855 0.90233192 0.88965172 0.90220118] mean value: 0.9052313012071413 key: test_accuracy value: [0.90566038 0.90566038 0.88461538 0.98076923 0.94230769 0.90384615 0.92307692 0.92307692 0.92307692 0.90384615] mean value: 0.9195936139332366 key: train_accuracy value: [0.95735608 0.95735608 0.95319149 0.95106383 0.95531915 0.95106383 0.95319149 0.95106383 0.94468085 0.95106383] mean value: 0.9525350451390464 key: test_fscore value: [0.90566038 0.90909091 0.875 0.98113208 0.94545455 0.89795918 0.92307692 0.92 0.92592593 0.9122807 ] mean value: 0.9195580641806348 key: train_fscore value: [0.95762712 0.95762712 0.95378151 0.95137421 0.95541401 0.95157895 0.95378151 0.95157895 0.94537815 0.95137421] mean value: 0.9529515735610741 key: test_precision value: [0.88888889 0.89285714 0.95454545 0.96296296 0.89655172 0.95652174 0.92307692 0.95833333 0.89285714 0.83870968] mean value: 0.916530498920957 key: train_precision value: [0.9535865 0.94957983 0.94190871 0.94537815 0.95338983 0.94166667 0.94190871 0.94166667 0.93360996 0.94537815] mean value: 0.9448073182078001 key: test_recall value: [0.92307692 0.92592593 0.80769231 1. 1. 0.84615385 0.92307692 0.88461538 0.96153846 1. ] mean value: 0.9272079772079772 key: train_recall value: [0.96170213 0.96581197 0.96595745 0.95744681 0.95744681 0.96170213 0.96595745 0.96170213 0.95744681 0.95744681] mean value: 0.9612620476450263 key: test_roc_auc value: [0.90598291 0.90527066 0.88461538 0.98076923 0.94230769 0.90384615 0.92307692 0.92307692 0.92307692 0.90384615] mean value: 0.9195868945868946 key: train_roc_auc value: [0.95734679 0.95737407 0.95319149 0.95106383 0.95531915 0.95106383 0.95319149 0.95106383 0.94468085 0.95106383] mean value: 0.952535915621022 key: test_jcc value: [0.82758621 0.83333333 0.77777778 0.96296296 0.89655172 0.81481481 0.85714286 0.85185185 0.86206897 0.83870968] mean value: 0.8522800171854676 key: train_jcc value: [0.91869919 0.91869919 0.91164659 0.90725806 0.91463415 0.90763052 0.91164659 0.90763052 0.89641434 0.90725806] mean value: 0.9101517208854413 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01405215 0.01066756 0.01140666 0.012398 0.01129866 0.01105595 0.01146221 0.01126456 0.01135373 0.01132464] mean value: 0.011628413200378418 key: score_time value: [0.01219368 0.00984573 0.00990057 0.00974965 0.00986099 0.00926423 0.00986862 0.00995731 0.0097661 0.00981498] mean value: 0.010022187232971191 key: test_mcc value: [0.66048569 0.40912228 0.71151247 0.77151675 0.80829038 0.65824263 0.57735027 0.50037023 0.54006172 0.73568294] mean value: 0.6372635364356517 key: train_mcc value: [0.66698754 0.68740344 0.69667663 0.67751905 0.69117257 0.71834239 0.67337154 0.67751905 0.67558392 0.69117257] mean value: 0.6855748724120782 key: test_accuracy value: [0.83018868 0.69811321 0.84615385 0.88461538 0.90384615 0.82692308 0.78846154 0.75 0.76923077 0.86538462] mean value: 0.8162917271407837 key: train_accuracy value: [0.8315565 0.84221748 0.84680851 0.83617021 0.84468085 0.85744681 0.83617021 0.83617021 0.83617021 0.84468085] mean value: 0.8412071859547249 key: test_fscore value: [0.82352941 0.66666667 0.82608696 0.88 0.90196078 0.81632653 0.78431373 0.75471698 0.76 0.85714286] mean value: 0.807074391364421 key: train_fscore value: [0.82247191 0.83408072 0.83928571 0.82539683 0.8388521 0.85011186 0.84057971 0.82539683 0.82774049 0.8388521 ] mean value: 0.8342768246079215 key: test_precision value: [0.84 0.76190476 0.95 0.91666667 0.92 0.86956522 0.8 0.74074074 0.79166667 0.91304348] mean value: 0.8503587531631009 key: train_precision value: [0.87142857 0.87735849 0.88262911 0.88349515 0.87155963 0.89622642 0.81854839 0.88349515 0.87264151 0.87155963] mean value: 0.8728942038918088 key: test_recall value: [0.80769231 0.59259259 0.73076923 0.84615385 0.88461538 0.76923077 0.76923077 0.76923077 0.73076923 0.80769231] mean value: 0.7707977207977208 key: train_recall value: [0.7787234 0.79487179 0.8 0.77446809 0.80851064 0.80851064 0.86382979 0.77446809 0.78723404 0.80851064] mean value: 0.7999127114020731 key: test_roc_auc value: [0.82977208 0.70014245 0.84615385 0.88461538 0.90384615 0.82692308 0.78846154 0.75 0.76923077 0.86538462] mean value: 0.8164529914529915 key: train_roc_auc value: [0.83166939 0.84211675 0.84680851 0.83617021 0.84468085 0.85744681 0.83617021 0.83617021 0.83617021 0.84468085] mean value: 0.8412084015275505 key: test_jcc value: [0.7 0.5 0.7037037 0.78571429 0.82142857 0.68965517 0.64516129 0.60606061 0.61290323 0.75 ] mean value: 0.6814626855449992 key: train_jcc value: [0.69847328 0.71538462 0.72307692 0.7027027 0.72243346 0.73929961 0.725 0.7027027 0.70610687 0.72243346] mean value: 0.7157613627585733 MCC on Blind test: 0.63 Accuracy on Blind test: 0.81 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01250458 0.01152492 0.01148963 0.01054597 0.01109338 0.01149631 0.01151705 0.0116148 0.01161146 0.01170826] mean value: 0.011510634422302246 key: score_time value: [0.01028419 0.00985003 0.00978994 0.00903082 0.00983238 0.00987172 0.01005745 0.01006413 0.00986838 0.00981331] mean value: 0.009846234321594238 key: test_mcc value: [0.73646724 0.50997151 0.6172134 0.88527041 0.69436507 0.69230769 0.69436507 0.77151675 0.69436507 0.69230769] mean value: 0.6988149917958625 key: train_mcc value: [0.74840423 0.76129503 0.71066404 0.74894295 0.75778307 0.77046393 0.71925314 0.68550371 0.74075423 0.75330062] mean value: 0.739636494142086 key: test_accuracy value: [0.86792453 0.75471698 0.80769231 0.94230769 0.84615385 0.84615385 0.84615385 0.88461538 0.84615385 0.84615385] mean value: 0.8488026124818577 key: train_accuracy value: [0.87420043 0.88059701 0.85531915 0.87446809 0.8787234 0.88510638 0.85957447 0.84255319 0.87021277 0.87659574] mean value: 0.8697350632853967 key: test_fscore value: [0.86792453 0.75471698 0.8 0.94117647 0.85185185 0.84615385 0.84 0.88 0.85185185 0.84615385] mean value: 0.8479829376033594 key: train_fscore value: [0.87473461 0.87931034 0.85470085 0.87473461 0.88050314 0.88655462 0.8583691 0.83982684 0.86825054 0.87553648] mean value: 0.869252113965142 key: test_precision value: [0.85185185 0.76923077 0.83333333 0.96 0.82142857 0.84615385 0.875 0.91666667 0.82142857 0.84615385] mean value: 0.8541247456247456 key: train_precision value: [0.87288136 0.88695652 0.8583691 0.87288136 0.8677686 0.87551867 0.86580087 0.85462555 0.88157895 0.88311688] mean value: 0.8719497846503439 key: test_recall value: [0.88461538 0.74074074 0.76923077 0.92307692 0.88461538 0.84615385 0.80769231 0.84615385 0.88461538 0.84615385] mean value: 0.8433048433048433 key: train_recall value: [0.87659574 0.87179487 0.85106383 0.87659574 0.89361702 0.89787234 0.85106383 0.82553191 0.85531915 0.86808511] mean value: 0.8667539552645935 key: test_roc_auc value: [0.86823362 0.75498575 0.80769231 0.94230769 0.84615385 0.84615385 0.84615385 0.88461538 0.84615385 0.84615385] mean value: 0.8488603988603989 key: train_roc_auc value: [0.87419531 0.88057829 0.85531915 0.87446809 0.8787234 0.88510638 0.85957447 0.84255319 0.87021277 0.87659574] mean value: 0.8697326786688488 key: test_jcc value: [0.76666667 0.60606061 0.66666667 0.88888889 0.74193548 0.73333333 0.72413793 0.78571429 0.74193548 0.73333333] mean value: 0.7388672679440199 key: train_jcc value: [0.77735849 0.78461538 0.74626866 0.77735849 0.78651685 0.79622642 0.7518797 0.7238806 0.76717557 0.77862595] mean value: 0.7689906114471405 MCC on Blind test: 0.76 Accuracy on Blind test: 0.88 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00976706 0.01079893 0.01038599 0.01070499 0.01065278 0.01069951 0.0106945 0.0093689 0.01063561 0.00939131] mean value: 0.010309958457946777 key: score_time value: [0.01736188 0.01571083 0.01535082 0.01273298 0.01242304 0.01235628 0.01241827 0.01493406 0.01261854 0.01222396] mean value: 0.013813066482543945 key: test_mcc value: [0.58487934 0.36194897 0.30769231 0.5990423 0.66628253 0.4233902 0.43929769 0.73568294 0.6172134 0.31139958] mean value: 0.504682924498233 key: train_mcc value: [0.72752093 0.72748132 0.73197454 0.69792921 0.71495188 0.71490009 0.72768593 0.69364214 0.70823856 0.73223982] mean value: 0.7176564421606096 key: test_accuracy value: [0.79245283 0.67924528 0.65384615 0.78846154 0.82692308 0.71153846 0.71153846 0.86538462 0.80769231 0.65384615] mean value: 0.7490928882438317 key: train_accuracy value: [0.86353945 0.86353945 0.86595745 0.84893617 0.85744681 0.85744681 0.86382979 0.84680851 0.85319149 0.86595745] mean value: 0.8586653359343102 key: test_fscore value: [0.78431373 0.66666667 0.65384615 0.75555556 0.84210526 0.71698113 0.66666667 0.87272727 0.8 0.625 ] mean value: 0.7383862436185877 key: train_fscore value: [0.86147186 0.86086957 0.86509636 0.84796574 0.85653105 0.85714286 0.86324786 0.84615385 0.84768212 0.86393089] mean value: 0.8570092145719881 key: test_precision value: [0.8 0.70833333 0.65384615 0.89473684 0.77419355 0.7037037 0.78947368 0.82758621 0.83333333 0.68181818] mean value: 0.7667024987634145 key: train_precision value: [0.87665198 0.87610619 0.87068966 0.85344828 0.86206897 0.85897436 0.86695279 0.84978541 0.88073394 0.87719298] mean value: 0.8672604557430365 key: test_recall value: [0.76923077 0.62962963 0.65384615 0.65384615 0.92307692 0.73076923 0.57692308 0.92307692 0.76923077 0.57692308] mean value: 0.7206552706552707 key: train_recall value: [0.84680851 0.84615385 0.85957447 0.84255319 0.85106383 0.85531915 0.85957447 0.84255319 0.81702128 0.85106383] mean value: 0.8471685761047463 key: test_roc_auc value: [0.79202279 0.68019943 0.65384615 0.78846154 0.82692308 0.71153846 0.71153846 0.86538462 0.80769231 0.65384615] mean value: 0.7491452991452991 key: train_roc_auc value: [0.8635752 0.86350245 0.86595745 0.84893617 0.85744681 0.85744681 0.86382979 0.84680851 0.85319149 0.86595745] mean value: 0.8586652118567013 key: test_jcc value: [0.64516129 0.5 0.48571429 0.60714286 0.72727273 0.55882353 0.5 0.77419355 0.66666667 0.45454545] mean value: 0.5919520359463434 key: train_jcc value: [0.75665399 0.75572519 0.76226415 0.73605948 0.74906367 0.75 0.7593985 0.73333333 0.73563218 0.76045627] mean value: 0.7498586771390656 MCC on Blind test: 0.33 Accuracy on Blind test: 0.67 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02233267 0.02127671 0.02358603 0.02476525 0.02091074 0.02076507 0.02088213 0.02093339 0.02088308 0.02073622] mean value: 0.02170712947845459 key: score_time value: [0.01467752 0.01178527 0.01262712 0.01558471 0.01146078 0.01158738 0.01152992 0.01151848 0.01147294 0.01148391] mean value: 0.012372803688049317 key: test_mcc value: [0.81196581 0.69957726 0.74466871 0.92307692 0.84866842 0.77849894 0.73131034 0.88527041 0.81312325 0.73568294] mean value: 0.7971843013202676 key: train_mcc value: [0.79530824 0.80810708 0.80451759 0.78298581 0.79149653 0.80018114 0.80000724 0.78726255 0.7957735 0.80428445] mean value: 0.7969924124303265 key: test_accuracy value: [0.90566038 0.8490566 0.86538462 0.96153846 0.92307692 0.88461538 0.86538462 0.94230769 0.90384615 0.86538462] mean value: 0.8966255442670538 key: train_accuracy value: [0.89765458 0.90405117 0.90212766 0.89148936 0.89574468 0.9 0.9 0.89361702 0.89787234 0.90212766] mean value: 0.8984684480333893 key: test_fscore value: [0.90566038 0.85714286 0.85106383 0.96153846 0.92592593 0.875 0.86792453 0.94117647 0.90909091 0.87272727] mean value: 0.8967250632461273 key: train_fscore value: [0.89787234 0.90364026 0.90336134 0.89171975 0.89552239 0.90105263 0.90021231 0.8940678 0.8974359 0.9017094 ] mean value: 0.8986594116764762 key: test_precision value: [0.88888889 0.82758621 0.95238095 0.96153846 0.89285714 0.95454545 0.85185185 0.96 0.86206897 0.82758621] mean value: 0.8979304131373097 key: train_precision value: [0.89787234 0.9055794 0.89211618 0.88983051 0.8974359 0.89166667 0.89830508 0.89029536 0.90128755 0.9055794 ] mean value: 0.8969968390902169 key: test_recall value: [0.92307692 0.88888889 0.76923077 0.96153846 0.96153846 0.80769231 0.88461538 0.92307692 0.96153846 0.92307692] mean value: 0.9004273504273504 key: train_recall value: [0.89787234 0.9017094 0.91489362 0.89361702 0.89361702 0.9106383 0.90212766 0.89787234 0.89361702 0.89787234] mean value: 0.9003837061283869 key: test_roc_auc value: [0.90598291 0.8482906 0.86538462 0.96153846 0.92307692 0.88461538 0.86538462 0.94230769 0.90384615 0.86538462] mean value: 0.8965811965811966 key: train_roc_auc value: [0.89765412 0.90404619 0.90212766 0.89148936 0.89574468 0.9 0.9 0.89361702 0.89787234 0.90212766] mean value: 0.8984679032551373 key: test_jcc value: [0.82758621 0.75 0.74074074 0.92592593 0.86206897 0.77777778 0.76666667 0.88888889 0.83333333 0.77419355] mean value: 0.8147182054134223 key: train_jcc value: [0.81467181 0.82421875 0.82375479 0.8045977 0.81081081 0.81992337 0.81853282 0.80842912 0.81395349 0.82101167] mean value: 0.81599043363822 MCC on Blind test: 0.67 Accuracy on Blind test: 0.83 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.43118334 1.92359424 1.93823981 1.90809631 1.79002547 0.74983549 1.94170189 1.90422225 1.8327477 1.80478406] mean value: 1.7224430561065673 key: score_time value: [0.0124011 0.01241326 0.01531434 0.02261472 0.01246166 0.01324511 0.01439619 0.02267289 0.02315784 0.0149169 ] mean value: 0.016359400749206544 key: test_mcc value: [0.73646724 0.8116984 0.77849894 0.96225045 0.85634884 0.77849894 0.80829038 0.84615385 0.84615385 0.82305489] mean value: 0.8247415773739302 key: train_mcc value: [0.98297841 0.99147118 0.99148936 0.99152527 0.9873145 0.91084449 0.9957537 0.99148936 0.97894501 0.9957537 ] mean value: 0.981756497283519 key: test_accuracy value: [0.86792453 0.90566038 0.88461538 0.98076923 0.92307692 0.88461538 0.90384615 0.92307692 0.92307692 0.90384615] mean value: 0.9100507982583455 key: train_accuracy value: [0.99147122 0.99573561 0.99574468 0.99574468 0.99361702 0.95531915 0.99787234 0.99574468 0.9893617 0.99787234] mean value: 0.9908483418772399 key: test_fscore value: [0.86792453 0.90909091 0.875 0.98113208 0.92857143 0.875 0.90196078 0.92307692 0.92307692 0.9122807 ] mean value: 0.909711427365788 key: train_fscore value: [0.99145299 0.9957265 0.99574468 0.9957265 0.99357602 0.95578947 0.9978678 0.99574468 0.98924731 0.99787686] mean value: 0.9908752808838321 key: test_precision value: [0.85185185 0.89285714 0.95454545 0.96296296 0.86666667 0.95454545 0.92 0.92307692 0.92307692 0.83870968] mean value: 0.9088293057002734 key: train_precision value: [0.99570815 0.9957265 0.99574468 1. 1. 0.94583333 1. 0.99574468 1. 0.99576271] mean value: 0.9924520057132802 key: test_recall value: [0.88461538 0.92592593 0.80769231 1. 1. 0.80769231 0.88461538 0.92307692 0.92307692 1. ] mean value: 0.9156695156695157 key: train_recall value: [0.98723404 0.9957265 0.99574468 0.99148936 0.98723404 0.96595745 0.99574468 0.99574468 0.9787234 1. ] mean value: 0.9893598836152028 key: test_roc_auc value: [0.86823362 0.90527066 0.88461538 0.98076923 0.92307692 0.88461538 0.90384615 0.92307692 0.92307692 0.90384615] mean value: 0.9100427350427351 key: train_roc_auc value: [0.99148027 0.99573559 0.99574468 0.99574468 0.99361702 0.95531915 0.99787234 0.99574468 0.9893617 0.99787234] mean value: 0.9908492453173304 key: test_jcc value: [0.76666667 0.83333333 0.77777778 0.96296296 0.86666667 0.77777778 0.82142857 0.85714286 0.85714286 0.83870968] mean value: 0.8359609148318826 key: train_jcc value: [0.98305085 0.99148936 0.99152542 0.99148936 0.98723404 0.91532258 0.99574468 0.99152542 0.9787234 0.99576271] mean value: 0.9821867838488652 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.0287106 0.02211738 0.02123046 0.02213478 0.01955199 0.02208471 0.0208869 0.02009106 0.02331758 0.02205229] mean value: 0.022217774391174318 key: score_time value: [0.01214767 0.00964355 0.00877619 0.00889492 0.00886869 0.00891852 0.00902438 0.00888228 0.00897551 0.0089376 ] mean value: 0.009306931495666504 key: test_mcc value: [0.81688878 0.92704716 0.92307692 0.88527041 0.84866842 0.96225045 0.84615385 0.84866842 0.77151675 1. ] mean value: 0.8829541177251579 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.90566038 0.96226415 0.96153846 0.94230769 0.92307692 0.98076923 0.92307692 0.92307692 0.88461538 1. ] mean value: 0.9406386066763426 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90909091 0.96428571 0.96153846 0.94339623 0.92592593 0.98039216 0.92307692 0.92592593 0.88888889 1. ] mean value: 0.9422521132010588 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.86206897 0.93103448 0.96153846 0.92592593 0.89285714 1. 0.92307692 0.89285714 0.85714286 1. ] mean value: 0.9246501901674316 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96153846 1. 0.96153846 0.96153846 0.96153846 0.96153846 0.92307692 0.96153846 0.92307692 1. ] mean value: 0.9615384615384616 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90669516 0.96153846 0.96153846 0.94230769 0.92307692 0.98076923 0.92307692 0.92307692 0.88461538 1. ] mean value: 0.9406695156695157 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83333333 0.93103448 0.92592593 0.89285714 0.86206897 0.96153846 0.85714286 0.86206897 0.8 1. ] mean value: 0.8925970134590824 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.87 Accuracy on Blind test: 0.93 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.13061213 0.12142134 0.12343693 0.12211251 0.12081861 0.12108946 0.119946 0.12066436 0.12075686 0.12019539] mean value: 0.12210536003112793 key: score_time value: [0.01764417 0.01821375 0.01812148 0.01800776 0.01792765 0.01793003 0.01794624 0.01797581 0.01795745 0.01803088] mean value: 0.017975521087646485 key: test_mcc value: [0.74106548 0.70042867 0.81312325 0.88527041 0.89056356 0.84866842 0.76923077 0.88527041 0.89056356 0.66628253] mean value: 0.8090467064561273 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86792453 0.8490566 0.90384615 0.94230769 0.94230769 0.92307692 0.88461538 0.94230769 0.94230769 0.82692308] mean value: 0.902467343976778 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.87272727 0.84615385 0.89795918 0.94117647 0.94545455 0.92 0.88461538 0.94117647 0.94545455 0.84210526] mean value: 0.9036822982413429 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.82758621 0.88 0.95652174 0.96 0.89655172 0.95833333 0.88461538 0.96 0.89655172 0.77419355] mean value: 0.8994353660638663 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.92307692 0.81481481 0.84615385 0.92307692 1. 0.88461538 0.88461538 0.92307692 1. 0.92307692] mean value: 0.9122507122507123 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86894587 0.8497151 0.90384615 0.94230769 0.94230769 0.92307692 0.88461538 0.94230769 0.94230769 0.82692308] mean value: 0.9026353276353276 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.77419355 0.73333333 0.81481481 0.88888889 0.89655172 0.85185185 0.79310345 0.88888889 0.89655172 0.72727273] mean value: 0.8265450949989326 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.81 Accuracy on Blind test: 0.9 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01057506 0.01090622 0.01024985 0.0104661 0.01062894 0.01125455 0.01053858 0.01054215 0.01159382 0.01146913] mean value: 0.010822439193725586 key: score_time value: [0.00924444 0.00917506 0.00940752 0.00961065 0.00959969 0.00898838 0.00904393 0.00929976 0.00917506 0.00896525] mean value: 0.009250974655151368 key: test_mcc value: [0.43536101 0.53035501 0.35273781 0.66628253 0.73568294 0.69230769 0.58789635 0.73131034 0.69230769 0.63245553] mean value: 0.605669690697596 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.71698113 0.75471698 0.67307692 0.82692308 0.86538462 0.84615385 0.78846154 0.86538462 0.84615385 0.80769231] mean value: 0.7990928882438316 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.69387755 0.72340426 0.63829787 0.80851064 0.85714286 0.84615385 0.76595745 0.8627451 0.84615385 0.82758621] mean value: 0.7869829618172682 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.73913043 0.85 0.71428571 0.9047619 0.91304348 0.84615385 0.85714286 0.88 0.84615385 0.75 ] mean value: 0.8300672081541647 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.65384615 0.62962963 0.57692308 0.73076923 0.80769231 0.84615385 0.69230769 0.84615385 0.84615385 0.92307692] mean value: 0.7552706552706553 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.71581197 0.75712251 0.67307692 0.82692308 0.86538462 0.84615385 0.78846154 0.86538462 0.84615385 0.80769231] mean value: 0.7992165242165242 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.53125 0.56666667 0.46875 0.67857143 0.75 0.73333333 0.62068966 0.75862069 0.73333333 0.70588235] mean value: 0.6547097459673524 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.82980156 1.81833506 1.86908293 1.83907104 1.79196358 1.76676679 1.80318642 1.84923625 1.91046953 1.88992739] mean value: 1.836784052848816 key: score_time value: [0.10081434 0.10086036 0.09425735 0.09559989 0.09269404 0.09352255 0.09992027 0.10102367 0.10104275 0.10122085] mean value: 0.09809560775756836 key: test_mcc value: [0.81688878 0.92450142 0.9258201 0.92307692 0.9258201 0.9258201 0.89056356 0.96225045 0.9258201 0.9258201 ] mean value: 0.914638163414845 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.90566038 0.96226415 0.96153846 0.96153846 0.96153846 0.96153846 0.94230769 0.98076923 0.96153846 0.96153846] mean value: 0.9560232220609579 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90909091 0.96296296 0.96 0.96153846 0.96296296 0.96 0.93877551 0.98039216 0.96296296 0.96296296] mean value: 0.9561648889548049 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.86206897 0.96296296 1. 0.96153846 0.92857143 1. 1. 1. 0.92857143 0.92857143] mean value: 0.9572284675732952 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96153846 0.96296296 0.92307692 0.96153846 1. 0.92307692 0.88461538 0.96153846 1. 1. ] mean value: 0.9578347578347578 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90669516 0.96225071 0.96153846 0.96153846 0.96153846 0.96153846 0.94230769 0.98076923 0.96153846 0.96153846] mean value: 0.9561253561253562 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83333333 0.92857143 0.92307692 0.92592593 0.92857143 0.92307692 0.88461538 0.96153846 0.92857143 0.92857143] mean value: 0.9165852665852666 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.81 Accuracy on Blind test: 0.9 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.00343037 0.91479254 0.98317862 0.97180915 0.9955368 1.05391216 0.92968369 0.95286179 1.04950261 0.98119926] mean value: 0.9835906982421875 key: score_time value: [0.20472693 0.27165556 0.25223088 0.27607179 0.22573662 0.22320628 0.12021565 0.27948952 0.23237991 0.22426629] mean value: 0.23099794387817382 key: test_mcc value: [0.81688878 0.77350427 0.9258201 0.92307692 0.9258201 0.88527041 0.89056356 0.96225045 0.9258201 0.88527041] mean value: 0.891428510912105 key: train_mcc value: [0.96588471 0.95309971 0.95744681 0.95320012 0.95748148 0.95320012 0.95744681 0.95320012 0.94893617 0.95320012] mean value: 0.955309616755026 key: test_accuracy value: [0.90566038 0.88679245 0.96153846 0.96153846 0.96153846 0.94230769 0.94230769 0.98076923 0.96153846 0.94230769] mean value: 0.9446298984034833 key: train_accuracy value: [0.98294243 0.97654584 0.9787234 0.97659574 0.9787234 0.97659574 0.9787234 0.97659574 0.97446809 0.97659574] mean value: 0.9776509549516853 key: test_fscore value: [0.90909091 0.88888889 0.96 0.96153846 0.96296296 0.94117647 0.93877551 0.98039216 0.96296296 0.94339623] mean value: 0.9449184549514342 key: train_fscore value: [0.98297872 0.9764454 0.9787234 0.97654584 0.97863248 0.97654584 0.9787234 0.97654584 0.97446809 0.97654584] mean value: 0.9776154860669302 key: test_precision value: [0.86206897 0.88888889 1. 0.96153846 0.92857143 0.96 1. 1. 0.92857143 0.92592593] mean value: 0.9455565099013374 key: train_precision value: [0.98297872 0.97854077 0.9787234 0.97863248 0.98283262 0.97863248 0.9787234 0.97863248 0.97446809 0.97863248] mean value: 0.9790796922109131 key: test_recall value: [0.96153846 0.88888889 0.92307692 0.96153846 1. 0.92307692 0.88461538 0.96153846 1. 0.96153846] mean value: 0.9465811965811965 key: train_recall value: [0.98297872 0.97435897 0.9787234 0.97446809 0.97446809 0.97446809 0.9787234 0.97446809 0.97446809 0.97446809] mean value: 0.9761593016912166 key: test_roc_auc value: [0.90669516 0.88675214 0.96153846 0.96153846 0.96153846 0.94230769 0.94230769 0.98076923 0.96153846 0.94230769] mean value: 0.9447293447293448 key: train_roc_auc value: [0.98294235 0.97654119 0.9787234 0.97659574 0.9787234 0.97659574 0.9787234 0.97659574 0.97446809 0.97659574] mean value: 0.977650481905801 key: test_jcc value: [0.83333333 0.8 0.92307692 0.92592593 0.92857143 0.88888889 0.88461538 0.96153846 0.92857143 0.89285714] mean value: 0.8967378917378918 key: train_jcc value: [0.9665272 0.9539749 0.95833333 0.95416667 0.958159 0.95416667 0.95833333 0.95416667 0.95020747 0.95416667] mean value: 0.9562201890079111 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02466607 0.01180482 0.01069355 0.01015139 0.01118636 0.01110482 0.01029754 0.01127911 0.01162481 0.0100522 ] mean value: 0.012286067008972168 key: score_time value: [0.01343274 0.01010823 0.00889921 0.00962663 0.00955248 0.00959492 0.00926352 0.00954437 0.01042676 0.00954938] mean value: 0.009999823570251466 key: test_mcc value: [0.73646724 0.50997151 0.6172134 0.88527041 0.69436507 0.69230769 0.69436507 0.77151675 0.69436507 0.69230769] mean value: 0.6988149917958625 key: train_mcc value: [0.74840423 0.76129503 0.71066404 0.74894295 0.75778307 0.77046393 0.71925314 0.68550371 0.74075423 0.75330062] mean value: 0.739636494142086 key: test_accuracy value: [0.86792453 0.75471698 0.80769231 0.94230769 0.84615385 0.84615385 0.84615385 0.88461538 0.84615385 0.84615385] mean value: 0.8488026124818577 key: train_accuracy value: [0.87420043 0.88059701 0.85531915 0.87446809 0.8787234 0.88510638 0.85957447 0.84255319 0.87021277 0.87659574] mean value: 0.8697350632853967 key: test_fscore value: [0.86792453 0.75471698 0.8 0.94117647 0.85185185 0.84615385 0.84 0.88 0.85185185 0.84615385] mean value: 0.8479829376033594 key: train_fscore value: [0.87473461 0.87931034 0.85470085 0.87473461 0.88050314 0.88655462 0.8583691 0.83982684 0.86825054 0.87553648] mean value: 0.869252113965142 key: test_precision value: [0.85185185 0.76923077 0.83333333 0.96 0.82142857 0.84615385 0.875 0.91666667 0.82142857 0.84615385] mean value: 0.8541247456247456 key: train_precision value: [0.87288136 0.88695652 0.8583691 0.87288136 0.8677686 0.87551867 0.86580087 0.85462555 0.88157895 0.88311688] mean value: 0.8719497846503439 key: test_recall value: [0.88461538 0.74074074 0.76923077 0.92307692 0.88461538 0.84615385 0.80769231 0.84615385 0.88461538 0.84615385] mean value: 0.8433048433048433 key: train_recall value: [0.87659574 0.87179487 0.85106383 0.87659574 0.89361702 0.89787234 0.85106383 0.82553191 0.85531915 0.86808511] mean value: 0.8667539552645935 key: test_roc_auc value: [0.86823362 0.75498575 0.80769231 0.94230769 0.84615385 0.84615385 0.84615385 0.88461538 0.84615385 0.84615385] mean value: 0.8488603988603989 key: train_roc_auc value: [0.87419531 0.88057829 0.85531915 0.87446809 0.8787234 0.88510638 0.85957447 0.84255319 0.87021277 0.87659574] mean value: 0.8697326786688488 key: test_jcc value: [0.76666667 0.60606061 0.66666667 0.88888889 0.74193548 0.73333333 0.72413793 0.78571429 0.74193548 0.73333333] mean value: 0.7388672679440199 key: train_jcc value: [0.77735849 0.78461538 0.74626866 0.77735849 0.78651685 0.79622642 0.7518797 0.7238806 0.76717557 0.77862595] mean value: 0.7689906114471405 MCC on Blind test: 0.76 Accuracy on Blind test: 0.88 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.0810225 0.0706861 0.15280747 0.10494733 0.06948256 0.08076954 0.07875824 0.08115697 0.0695591 0.0707655 ] mean value: 0.08599553108215333 key: score_time value: [0.01104665 0.010849 0.01349568 0.01159859 0.01075864 0.01105809 0.01134682 0.01259756 0.01232362 0.01064587] mean value: 0.011572051048278808 key: test_mcc value: [0.85164138 0.96291111 0.96225045 0.96225045 0.9258201 0.96225045 0.84866842 0.9258201 0.9258201 0.96225045] mean value: 0.9289683006952936 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9245283 0.98113208 0.98076923 0.98076923 0.96153846 0.98076923 0.92307692 0.96153846 0.96153846 0.98076923] mean value: 0.9636429608127721 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92592593 0.98181818 0.98039216 0.98113208 0.96296296 0.98039216 0.92 0.96 0.96296296 0.98113208] mean value: 0.963671849833892 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.89285714 0.96428571 1. 0.96296296 0.92857143 1. 0.95833333 1. 0.92857143 0.96296296] mean value: 0.9598544973544973 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96153846 1. 0.96153846 1. 1. 0.96153846 0.88461538 0.92307692 1. 1. ] mean value: 0.9692307692307692 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.92521368 0.98076923 0.98076923 0.98076923 0.96153846 0.98076923 0.92307692 0.96153846 0.96153846 0.98076923] mean value: 0.9636752136752137 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.86206897 0.96428571 0.96153846 0.96296296 0.92857143 0.96153846 0.85185185 0.92307692 0.92857143 0.96296296] mean value: 0.9307429160877436 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.86 Accuracy on Blind test: 0.93 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.03567243 0.04290986 0.0472014 0.07730103 0.06250048 0.05814052 0.0723269 0.04689837 0.08232975 0.05255485] mean value: 0.05778355598449707 key: score_time value: [0.01217079 0.01232409 0.01893544 0.01794481 0.01220798 0.01889324 0.01251721 0.01215696 0.01258445 0.0193646 ] mean value: 0.014909958839416504 key: test_mcc value: [0.73646724 0.73997003 0.77849894 0.88527041 0.89056356 0.77849894 0.57735027 0.73131034 0.74466871 0.71151247] mean value: 0.7574110914556645 key: train_mcc value: [0.89794254 0.89379475 0.91542421 0.90220118 0.91922384 0.91084449 0.91084449 0.91922384 0.91935705 0.90641581] mean value: 0.909527219993544 key: test_accuracy value: [0.86792453 0.86792453 0.88461538 0.94230769 0.94230769 0.88461538 0.78846154 0.86538462 0.86538462 0.84615385] mean value: 0.8755079825834543 key: train_accuracy value: [0.94882729 0.9466951 0.95744681 0.95106383 0.95957447 0.95531915 0.95531915 0.95957447 0.95957447 0.95319149] mean value: 0.9546586217846935 key: test_fscore value: [0.86792453 0.87719298 0.875 0.94339623 0.94545455 0.875 0.79245283 0.8627451 0.87719298 0.86206897] mean value: 0.8778428158828944 key: train_fscore value: [0.94957983 0.94736842 0.958159 0.95137421 0.95983087 0.95578947 0.95578947 0.95983087 0.96 0.95338983] mean value: 0.9551111967481583 key: test_precision value: [0.85185185 0.83333333 0.95454545 0.92592593 0.89655172 0.95454545 0.77777778 0.88 0.80645161 0.78125 ] mean value: 0.8662233135020955 key: train_precision value: [0.93775934 0.93360996 0.94238683 0.94537815 0.95378151 0.94583333 0.94583333 0.95378151 0.95 0.94936709] mean value: 0.945773105762638 key: test_recall value: [0.88461538 0.92592593 0.80769231 0.96153846 1. 0.80769231 0.80769231 0.84615385 0.96153846 0.96153846] mean value: 0.8964387464387464 key: train_recall value: [0.96170213 0.96153846 0.97446809 0.95744681 0.96595745 0.96595745 0.96595745 0.96595745 0.97021277 0.95744681] mean value: 0.9646644844517185 key: test_roc_auc value: [0.86823362 0.86680912 0.88461538 0.94230769 0.94230769 0.88461538 0.78846154 0.86538462 0.86538462 0.84615385] mean value: 0.8754273504273504 key: train_roc_auc value: [0.94879978 0.94672668 0.95744681 0.95106383 0.95957447 0.95531915 0.95531915 0.95957447 0.95957447 0.95319149] mean value: 0.9546590289143482 key: test_jcc value: [0.76666667 0.78125 0.77777778 0.89285714 0.89655172 0.77777778 0.65625 0.75862069 0.78125 0.75757576] mean value: 0.7846577536448226 key: train_jcc value: [0.904 0.9 0.91967871 0.90725806 0.92276423 0.91532258 0.91532258 0.92276423 0.92307692 0.91093117] mean value: 0.9141118493116434 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.0105176 0.01079559 0.01000309 0.00996852 0.00984311 0.00979662 0.01083136 0.00997591 0.00995231 0.01000428] mean value: 0.010168838500976562 key: score_time value: [0.01233459 0.0136776 0.00898337 0.00869012 0.00872898 0.0086844 0.00921631 0.00893378 0.00907898 0.00953197] mean value: 0.009786009788513184 key: test_mcc value: [0.6980057 0.51359557 0.73568294 0.84615385 0.84615385 0.73568294 0.65824263 0.65433031 0.73568294 0.6172134 ] mean value: 0.704074411237807 key: train_mcc value: [0.68485508 0.7273009 0.70669657 0.70654292 0.74910575 0.75745367 0.69818215 0.6730782 0.69863813 0.71087004] mean value: 0.7112723392902409 key: test_accuracy value: [0.8490566 0.75471698 0.86538462 0.92307692 0.92307692 0.86538462 0.82692308 0.82692308 0.86538462 0.80769231] mean value: 0.8507619738751815 key: train_accuracy value: [0.84221748 0.86353945 0.85319149 0.85319149 0.87446809 0.8787234 0.84893617 0.83617021 0.84893617 0.85531915] mean value: 0.8554693099850292 key: test_fscore value: [0.84615385 0.74509804 0.85714286 0.92307692 0.92307692 0.85714286 0.81632653 0.82352941 0.87272727 0.8 ] mean value: 0.8464274660913317 key: train_fscore value: [0.83982684 0.86147186 0.85097192 0.8516129 0.87311828 0.87846482 0.84665227 0.83224401 0.8453159 0.85344828] mean value: 0.8533127081638621 key: test_precision value: [0.84615385 0.79166667 0.91304348 0.92307692 0.92307692 0.91304348 0.86956522 0.84 0.82758621 0.83333333] mean value: 0.8680546073117288 key: train_precision value: [0.85462555 0.87280702 0.86403509 0.86086957 0.8826087 0.88034188 0.85964912 0.85267857 0.86607143 0.86462882] mean value: 0.8658315740903113 key: test_recall value: [0.84615385 0.7037037 0.80769231 0.92307692 0.92307692 0.80769231 0.76923077 0.80769231 0.92307692 0.76923077] mean value: 0.8280626780626781 key: train_recall value: [0.82553191 0.85042735 0.83829787 0.84255319 0.86382979 0.87659574 0.83404255 0.81276596 0.82553191 0.84255319] mean value: 0.8412129478086925 key: test_roc_auc value: [0.84900285 0.75569801 0.86538462 0.92307692 0.92307692 0.86538462 0.82692308 0.82692308 0.86538462 0.80769231] mean value: 0.8508547008547008 key: train_roc_auc value: [0.84225314 0.86351155 0.85319149 0.85319149 0.87446809 0.8787234 0.84893617 0.83617021 0.84893617 0.85531915] mean value: 0.8554700854700855 key: test_jcc value: [0.73333333 0.59375 0.75 0.85714286 0.85714286 0.75 0.68965517 0.7 0.77419355 0.66666667] mean value: 0.7371884435086604 key: train_jcc value: [0.7238806 0.75665399 0.7406015 0.74157303 0.77480916 0.78326996 0.7340824 0.71268657 0.73207547 0.7443609 ] mean value: 0.7443993587281833 MCC on Blind test: 0.76 Accuracy on Blind test: 0.88 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01451564 0.01753974 0.02073836 0.02360868 0.0248158 0.01841426 0.02023935 0.01849508 0.02454448 0.02476811] mean value: 0.020767951011657716 key: score_time value: [0.01012969 0.01133466 0.01186538 0.01195502 0.01203322 0.01204181 0.01217699 0.0120542 0.01200485 0.01193643] mean value: 0.011753225326538086 key: test_mcc value: [0.77350427 0.70527596 0.74466871 0.88527041 0.85634884 0.6789146 0.77849894 0.79056942 0.88527041 0.69436507] mean value: 0.7792686643983939 key: train_mcc value: [0.87814682 0.88584735 0.88164966 0.92424143 0.93221879 0.84593758 0.87157206 0.76874221 0.86448019 0.85856681] mean value: 0.8711402909907915 key: test_accuracy value: [0.88679245 0.8490566 0.86538462 0.94230769 0.92307692 0.82692308 0.88461538 0.88461538 0.94230769 0.84615385] mean value: 0.8851233671988389 key: train_accuracy value: [0.93816631 0.9424307 0.94042553 0.96170213 0.96595745 0.9212766 0.93404255 0.87446809 0.92978723 0.92553191] mean value: 0.9333788504287075 key: test_fscore value: [0.88461538 0.86206897 0.85106383 0.94117647 0.92857143 0.8 0.875 0.86956522 0.94117647 0.84 ] mean value: 0.8793237767059063 key: train_fscore value: [0.93626374 0.94363257 0.93913043 0.96086957 0.96551724 0.91759465 0.93095768 0.85851319 0.9258427 0.92027335] mean value: 0.9298595118619817 key: test_precision value: [0.88461538 0.80645161 0.95238095 0.96 0.86666667 0.94736842 0.95454545 1. 0.96 0.875 ] mean value: 0.9207028492164315 key: train_precision value: [0.96818182 0.92244898 0.96 0.98222222 0.97816594 0.96261682 0.97663551 0.98351648 0.98095238 0.99019608] mean value: 0.9704936238209341 key: test_recall value: [0.88461538 0.92592593 0.76923077 0.92307692 1. 0.69230769 0.80769231 0.76923077 0.92307692 0.80769231] mean value: 0.8502849002849003 key: train_recall value: [0.90638298 0.96581197 0.91914894 0.94042553 0.95319149 0.87659574 0.8893617 0.76170213 0.87659574 0.85957447] mean value: 0.8948790689216222 key: test_roc_auc value: [0.88675214 0.84757835 0.86538462 0.94230769 0.92307692 0.82692308 0.88461538 0.88461538 0.94230769 0.84615385] mean value: 0.88497150997151 key: train_roc_auc value: [0.93823422 0.94248045 0.94042553 0.96170213 0.96595745 0.9212766 0.93404255 0.87446809 0.92978723 0.92553191] mean value: 0.9333906164757229 key: test_jcc value: [0.79310345 0.75757576 0.74074074 0.88888889 0.86666667 0.66666667 0.77777778 0.76923077 0.88888889 0.72413793] mean value: 0.7873677535746502 key: train_jcc value: [0.88016529 0.89328063 0.8852459 0.92468619 0.93333333 0.84773663 0.87083333 0.75210084 0.86192469 0.85232068] mean value: 0.8701627509590387 MCC on Blind test: 0.71 Accuracy on Blind test: 0.83 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02051973 0.02308512 0.02264977 0.02195239 0.01951814 0.01922989 0.01749086 0.02073479 0.02035427 0.01736856] mean value: 0.020290350914001463 key: score_time value: [0.01103067 0.01198816 0.01198959 0.0119803 0.0121727 0.01211762 0.01197362 0.01196384 0.01207113 0.01193643] mean value: 0.011922407150268554 key: test_mcc value: [0.81688878 0.68308228 0.80829038 0.75878691 0.84615385 0.81312325 0.6789146 0.84866842 0.84866842 0.74466871] mean value: 0.7847245604589297 key: train_mcc value: [0.82318874 0.80844901 0.89143025 0.73855496 0.85379422 0.90278998 0.67317249 0.89094414 0.89427309 0.82331429] mean value: 0.8299911175592818 key: test_accuracy value: [0.90566038 0.83018868 0.90384615 0.86538462 0.92307692 0.90384615 0.82692308 0.92307692 0.92307692 0.86538462] mean value: 0.8870464441219158 key: train_accuracy value: [0.90618337 0.89765458 0.94468085 0.85531915 0.92340426 0.95106383 0.81489362 0.94468085 0.94680851 0.90851064] mean value: 0.9093199655219344 key: test_fscore value: [0.90909091 0.85245902 0.90566038 0.88135593 0.92307692 0.89795918 0.8 0.92 0.92592593 0.85106383] mean value: 0.8866592097509784 key: train_fscore value: [0.91338583 0.90588235 0.94650206 0.87265918 0.91818182 0.9519833 0.7751938 0.94298246 0.94780793 0.90249433] mean value: 0.9077073048926279 key: test_precision value: [0.86206897 0.76470588 0.88888889 0.78787879 0.92307692 0.95652174 0.94736842 0.95833333 0.89285714 0.95238095] mean value: 0.8934081036469277 key: train_precision value: [0.84981685 0.83695652 0.91633466 0.77926421 0.98536585 0.93442623 0.98684211 0.97285068 0.93032787 0.96601942] mean value: 0.9158204400448495 key: test_recall value: [0.96153846 0.96296296 0.92307692 1. 0.92307692 0.84615385 0.69230769 0.88461538 0.96153846 0.76923077] mean value: 0.8924501424501424 key: train_recall value: [0.98723404 0.98717949 0.9787234 0.99148936 0.85957447 0.97021277 0.63829787 0.91489362 0.96595745 0.84680851] mean value: 0.9140370976541189 key: test_roc_auc value: [0.90669516 0.82763533 0.90384615 0.86538462 0.92307692 0.90384615 0.82692308 0.92307692 0.92307692 0.86538462] mean value: 0.8868945868945869 key: train_roc_auc value: [0.90601018 0.89784506 0.94468085 0.85531915 0.92340426 0.95106383 0.81489362 0.94468085 0.94680851 0.90851064] mean value: 0.9093216948536097 key: test_jcc value: [0.83333333 0.74285714 0.82758621 0.78787879 0.85714286 0.81481481 0.66666667 0.85185185 0.86206897 0.74074074] mean value: 0.7984941367699988 key: train_jcc value: [0.84057971 0.82795699 0.8984375 0.77408638 0.8487395 0.90836653 0.63291139 0.89211618 0.90079365 0.82231405] mean value: 0.8346301883150747 MCC on Blind test: 0.69 Accuracy on Blind test: 0.83 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.18447971 0.18334126 0.18150234 0.1833899 0.18306136 0.18128395 0.18111897 0.17938328 0.18043876 0.18003964] mean value: 0.18180391788482667 key: score_time value: [0.015414 0.01569867 0.0158639 0.01544142 0.01620245 0.01581216 0.01540232 0.01535916 0.01594925 0.01538944] mean value: 0.015653276443481447 key: test_mcc value: [0.85164138 0.96291111 0.96225045 0.96225045 0.9258201 0.96225045 0.81312325 0.96225045 0.9258201 0.92307692] mean value: 0.9251394652761287 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9245283 0.98113208 0.98076923 0.98076923 0.96153846 0.98076923 0.90384615 0.98076923 0.96153846 0.96153846] mean value: 0.9617198838896952 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92592593 0.98181818 0.98039216 0.98113208 0.96296296 0.98039216 0.89795918 0.98039216 0.96296296 0.96153846] mean value: 0.9615476224941898 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.89285714 0.96428571 1. 0.96296296 0.92857143 1. 0.95652174 1. 0.92857143 0.96153846] mean value: 0.9595308877917573 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96153846 1. 0.96153846 1. 1. 0.96153846 0.84615385 0.96153846 1. 0.96153846] mean value: 0.9653846153846154 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.92521368 0.98076923 0.98076923 0.98076923 0.96153846 0.98076923 0.90384615 0.98076923 0.96153846 0.96153846] mean value: 0.9617521367521368 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.86206897 0.96428571 0.96153846 0.96296296 0.92857143 0.96153846 0.81481481 0.96153846 0.92857143 0.92592593] mean value: 0.9271816625264901 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.9 Accuracy on Blind test: 0.95 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.06588101 0.05572891 0.06235313 0.06345272 0.08077598 0.07156444 0.08671784 0.07806897 0.08278823 0.08851504] mean value: 0.07358462810516357 key: score_time value: [0.02843833 0.02826238 0.02912283 0.02685213 0.03997207 0.02721906 0.03725529 0.02411723 0.03921318 0.03339529] mean value: 0.03138477802276611 key: test_mcc value: [0.85164138 0.92450142 0.9258201 0.96225045 0.9258201 0.96225045 0.84866842 0.88527041 0.89056356 0.96225045] mean value: 0.9139036744202026 key: train_mcc value: [0.98721586 0.98721563 0.98312115 0.97873227 0.9957537 0.9873145 0.98724298 0.99152527 0.97478586 0.98297872] mean value: 0.9855885925245994 key: test_accuracy value: [0.9245283 0.96226415 0.96153846 0.98076923 0.96153846 0.98076923 0.92307692 0.94230769 0.94230769 0.98076923] mean value: 0.9559869375907112 key: train_accuracy value: [0.99360341 0.99360341 0.99148936 0.9893617 0.99787234 0.99361702 0.99361702 0.99574468 0.98723404 0.99148936] mean value: 0.9927632354942613 key: test_fscore value: [0.92592593 0.96296296 0.96 0.98113208 0.96296296 0.98039216 0.92 0.94339623 0.94545455 0.98113208] mean value: 0.9563358931527632 key: train_fscore value: [0.99360341 0.99357602 0.99141631 0.98933902 0.9978678 0.99357602 0.99363057 0.9957265 0.98739496 0.99148936] mean value: 0.9927619966475919 key: test_precision value: [0.89285714 0.96296296 1. 0.96296296 0.92857143 1. 0.95833333 0.92592593 0.89655172 0.96296296] mean value: 0.949112844371465 key: train_precision value: [0.9957265 0.99570815 1. 0.99145299 1. 1. 0.99152542 1. 0.97510373 0.99148936] mean value: 0.99410061615567 key: test_recall value: [0.96153846 0.96296296 0.92307692 1. 1. 0.96153846 0.88461538 0.96153846 1. 1. ] mean value: 0.9655270655270656 key: train_recall value: [0.99148936 0.99145299 0.98297872 0.98723404 0.99574468 0.98723404 0.99574468 0.99148936 1. 0.99148936] mean value: 0.9914857246772141 key: test_roc_auc value: [0.92521368 0.96225071 0.96153846 0.98076923 0.96153846 0.98076923 0.92307692 0.94230769 0.94230769 0.98076923] mean value: 0.9560541310541311 key: train_roc_auc value: [0.99360793 0.99359884 0.99148936 0.9893617 0.99787234 0.99361702 0.99361702 0.99574468 0.98723404 0.99148936] mean value: 0.9927632296781232 key: test_jcc value: [0.86206897 0.92857143 0.92307692 0.96296296 0.92857143 0.96153846 0.85185185 0.89285714 0.89655172 0.96296296] mean value: 0.9171013852048335 key: train_jcc value: [0.98728814 0.98723404 0.98297872 0.97890295 0.99574468 0.98723404 0.98734177 0.99148936 0.97510373 0.98312236] mean value: 0.9856439809704479 MCC on Blind test: 0.9 Accuracy on Blind test: 0.95 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.14703774 0.11411095 0.19120908 0.13331914 0.13890243 0.21100497 0.18627715 0.17440438 0.17817092 0.1670382 ] mean value: 0.1641474962234497 key: score_time value: [0.0247128 0.0149591 0.02889299 0.01530313 0.01501536 0.02427626 0.02445054 0.0240407 0.02411985 0.02399731] mean value: 0.021976804733276366 key: test_mcc value: [0.73646724 0.50997151 0.50336201 0.77151675 0.6789146 0.65433031 0.69436507 0.81312325 0.81312325 0.53846154] mean value: 0.6713635523699482 key: train_mcc value: [0.98728791 0.99150708 0.9873145 0.9873145 0.9873145 0.99152527 0.9873145 0.9873145 0.9873145 0.9873145 ] mean value: 0.9881521740698564 key: test_accuracy value: [0.86792453 0.75471698 0.75 0.88461538 0.82692308 0.82692308 0.84615385 0.90384615 0.90384615 0.76923077] mean value: 0.8334179970972424 key: train_accuracy value: [0.99360341 0.99573561 0.99361702 0.99361702 0.99361702 0.99574468 0.99361702 0.99361702 0.99361702 0.99361702] mean value: 0.9940402848977 key: test_fscore value: [0.86792453 0.75471698 0.73469388 0.88 0.84745763 0.82352941 0.84 0.89795918 0.90909091 0.76923077] mean value: 0.832460328786348 key: train_fscore value: [0.99357602 0.99570815 0.99357602 0.99357602 0.99357602 0.9957265 0.99357602 0.99357602 0.99357602 0.99357602] mean value: 0.9940042787277901 key: test_precision value: [0.85185185 0.76923077 0.7826087 0.91666667 0.75757576 0.84 0.875 0.95652174 0.86206897 0.76923077] mean value: 0.8380755214855664 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88461538 0.74074074 0.69230769 0.84615385 0.96153846 0.80769231 0.80769231 0.84615385 0.96153846 0.76923077] mean value: 0.8317663817663817 key: train_recall value: [0.98723404 0.99145299 0.98723404 0.98723404 0.98723404 0.99148936 0.98723404 0.98723404 0.98723404 0.98723404] mean value: 0.9880814693580651 key: test_roc_auc value: [0.86823362 0.75498575 0.75 0.88461538 0.82692308 0.82692308 0.84615385 0.90384615 0.90384615 0.76923077] mean value: 0.8334757834757835 key: train_roc_auc value: [0.99361702 0.9957265 0.99361702 0.99361702 0.99361702 0.99574468 0.99361702 0.99361702 0.99361702 0.99361702] mean value: 0.9940407346790325 key: test_jcc value: [0.76666667 0.60606061 0.58064516 0.78571429 0.73529412 0.7 0.72413793 0.81481481 0.83333333 0.625 ] mean value: 0.7171666916561571 key: train_jcc value: [0.98723404 0.99145299 0.98723404 0.98723404 0.98723404 0.99148936 0.98723404 0.98723404 0.98723404 0.98723404] mean value: 0.9880814693580651 MCC on Blind test: 0.62 Accuracy on Blind test: 0.81 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.73474646 0.72688985 0.72868824 0.73179102 0.73014355 0.73290229 0.73310161 0.72938943 0.7374897 0.73583198] mean value: 0.7320974111557007 key: score_time value: [0.00944114 0.00925851 0.00926185 0.00938869 0.00955248 0.00936866 0.00937533 0.00944853 0.0102365 0.00951147] mean value: 0.009484314918518066 key: test_mcc value: [0.85164138 0.92704716 0.96225045 0.96225045 0.9258201 0.96225045 0.88527041 0.92307692 0.89056356 0.96225045] mean value: 0.9252421331587133 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9245283 0.96226415 0.98076923 0.98076923 0.96153846 0.98076923 0.94230769 0.96153846 0.94230769 0.98076923] mean value: 0.9617561683599419 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92592593 0.96428571 0.98039216 0.98113208 0.96296296 0.98039216 0.94117647 0.96153846 0.94545455 0.98113208] mean value: 0.9624392545424731 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.89285714 0.93103448 1. 0.96296296 0.92857143 1. 0.96 0.96153846 0.89655172 0.96296296] mean value: 0.9496479165789511 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96153846 1. 0.96153846 1. 1. 0.96153846 0.92307692 0.96153846 1. 1. ] mean value: 0.9769230769230769 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.92521368 0.96153846 0.98076923 0.98076923 0.96153846 0.98076923 0.94230769 0.96153846 0.94230769 0.98076923] mean value: 0.9617521367521368 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.86206897 0.93103448 0.96153846 0.96296296 0.92857143 0.96153846 0.88888889 0.92592593 0.89655172 0.96296296] mean value: 0.9282044264802886 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.9 Accuracy on Blind test: 0.95 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03148365 0.05373335 0.05133128 0.03273869 0.03149581 0.03127074 0.03117824 0.03164434 0.03118968 0.03135705] mean value: 0.03574228286743164 key: score_time value: [0.01274085 0.01517105 0.01338124 0.01331782 0.01315212 0.01508641 0.01483965 0.01540351 0.014956 0.01507926] mean value: 0.01431279182434082 key: test_mcc value: [0.48187381 0.35897436 0.6172134 0.73131034 0.4259217 0.39528471 0.35273781 0.50951017 0.5990423 0.38575837] mean value: 0.48576269695469176 key: train_mcc value: [0.86418083 0.95749365 0.97029183 0.95361464 0.92156343 0.80568158 0.86066297 0.97873227 0.79494933 0.926125 ] mean value: 0.9033295534903398 key: test_accuracy value: [0.73584906 0.67924528 0.80769231 0.86538462 0.71153846 0.69230769 0.67307692 0.75 0.78846154 0.69230769] mean value: 0.7395863570391872 key: train_accuracy value: [0.92750533 0.97867804 0.98510638 0.97659574 0.95957447 0.89361702 0.92553191 0.9893617 0.88723404 0.96170213] mean value: 0.9484906773125255 key: test_fscore value: [0.69565217 0.67924528 0.8 0.86792453 0.69387755 0.65217391 0.63829787 0.77192982 0.75555556 0.68 ] mean value: 0.7234656701755069 key: train_fscore value: [0.92201835 0.97844828 0.98501071 0.9769392 0.9580574 0.88095238 0.91954023 0.98938429 0.87290168 0.96017699] mean value: 0.9443429499014124 key: test_precision value: [0.8 0.69230769 0.83333333 0.85185185 0.73913043 0.75 0.71428571 0.70967742 0.89473684 0.70833333] mean value: 0.7693656621354635 key: train_precision value: [1. 0.98695652 0.99137931 0.96280992 0.99541284 1. 1. 0.98728814 1. 1. ] mean value: 0.9923846729069248 key: test_recall value: [0.61538462 0.66666667 0.76923077 0.88461538 0.65384615 0.57692308 0.57692308 0.84615385 0.65384615 0.65384615] mean value: 0.6897435897435897 key: train_recall value: [0.85531915 0.97008547 0.9787234 0.99148936 0.92340426 0.78723404 0.85106383 0.99148936 0.77446809 0.92340426] mean value: 0.9046681214766321 key: test_roc_auc value: [0.73361823 0.67948718 0.80769231 0.86538462 0.71153846 0.69230769 0.67307692 0.75 0.78846154 0.69230769] mean value: 0.7393874643874644 key: train_roc_auc value: [0.92765957 0.97865976 0.98510638 0.97659574 0.95957447 0.89361702 0.92553191 0.9893617 0.88723404 0.96170213] mean value: 0.9485042735042735 key: test_jcc value: [0.53333333 0.51428571 0.66666667 0.76666667 0.53125 0.48387097 0.46875 0.62857143 0.60714286 0.51515152] mean value: 0.5715689149560117 key: train_jcc value: [0.85531915 0.95780591 0.97046414 0.95491803 0.91949153 0.78723404 0.85106383 0.9789916 0.77446809 0.92340426] mean value: 0.897316055874549 MCC on Blind test: 0.57 Accuracy on Blind test: 0.79 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02873778 0.03986144 0.03904796 0.03797555 0.03872371 0.03989911 0.04019141 0.05784249 0.0290029 0.0340817 ] mean value: 0.03853640556335449 key: score_time value: [0.02153182 0.02466536 0.01894593 0.01888561 0.02026582 0.01964355 0.01979423 0.03150344 0.03011656 0.02050829] mean value: 0.0225860595703125 key: test_mcc value: [0.85164138 0.70527596 0.77849894 0.92307692 0.89056356 0.74466871 0.73131034 0.88527041 0.81312325 0.77849894] mean value: 0.8101928417182709 key: train_mcc value: [0.86403192 0.87219919 0.86461295 0.86411148 0.86828166 0.85995606 0.88136192 0.87262489 0.86847048 0.86395495] mean value: 0.8679605508568666 key: test_accuracy value: [0.9245283 0.8490566 0.88461538 0.96153846 0.94230769 0.86538462 0.86538462 0.94230769 0.90384615 0.88461538] mean value: 0.9023584905660378 key: train_accuracy value: [0.93176972 0.93603412 0.93191489 0.93191489 0.93404255 0.92978723 0.94042553 0.93617021 0.93404255 0.93191489] mean value: 0.9338016603910538 key: test_fscore value: [0.92592593 0.86206897 0.875 0.96153846 0.94545455 0.85106383 0.86792453 0.94117647 0.90909091 0.89285714] mean value: 0.9032100779061583 key: train_fscore value: [0.93305439 0.93644068 0.93333333 0.93277311 0.93473684 0.93081761 0.94142259 0.93697479 0.93501048 0.93248945] mean value: 0.9347053283732041 key: test_precision value: [0.89285714 0.80645161 0.95454545 0.96153846 0.89655172 0.95238095 0.85185185 0.96 0.86206897 0.83333333] mean value: 0.8971579499065595 key: train_precision value: [0.91769547 0.92857143 0.91428571 0.92116183 0.925 0.91735537 0.92592593 0.9253112 0.9214876 0.92468619] mean value: 0.9221480738754971 key: test_recall value: [0.96153846 0.92592593 0.80769231 0.96153846 1. 0.76923077 0.88461538 0.92307692 0.96153846 0.96153846] mean value: 0.9156695156695157 key: train_recall value: [0.94893617 0.94444444 0.95319149 0.94468085 0.94468085 0.94468085 0.95744681 0.94893617 0.94893617 0.94042553] mean value: 0.9476359338061466 key: test_roc_auc value: [0.92521368 0.84757835 0.88461538 0.96153846 0.94230769 0.86538462 0.86538462 0.94230769 0.90384615 0.88461538] mean value: 0.9022792022792023 key: train_roc_auc value: [0.93173304 0.93605201 0.93191489 0.93191489 0.93404255 0.92978723 0.94042553 0.93617021 0.93404255 0.93191489] mean value: 0.9337997817785052 key: test_jcc value: [0.86206897 0.75757576 0.77777778 0.92592593 0.89655172 0.74074074 0.76666667 0.88888889 0.83333333 0.80645161] mean value: 0.8255981393467489 key: train_jcc value: [0.8745098 0.88047809 0.875 0.87401575 0.87747036 0.87058824 0.88932806 0.88142292 0.87795276 0.87351779] mean value: 0.877428376123688 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.14161825 0.28556347 0.29658747 0.24531269 0.1590662 0.27728081 0.2977221 0.21515226 0.22903538 0.18153119] mean value: 0.23288698196411134 key: score_time value: [0.0169847 0.02052855 0.02147937 0.02026939 0.01914334 0.02140737 0.02591467 0.01721787 0.02034116 0.01228237] mean value: 0.019556879997253418 key: test_mcc value: [0.85164138 0.62867836 0.74466871 0.92307692 0.89056356 0.74466871 0.73131034 0.88527041 0.81312325 0.77849894] mean value: 0.7991500590969782 key: train_mcc value: [0.86403192 0.80817284 0.80058734 0.86411148 0.86828166 0.85995606 0.88136192 0.87262489 0.86847048 0.86395495] mean value: 0.855155354590099 key: test_accuracy value: [0.9245283 0.81132075 0.86538462 0.96153846 0.94230769 0.86538462 0.86538462 0.94230769 0.90384615 0.88461538] mean value: 0.8966618287373005 key: train_accuracy value: [0.93176972 0.90405117 0.9 0.93191489 0.93404255 0.92978723 0.94042553 0.93617021 0.93404255 0.93191489] mean value: 0.9274118767862813 key: test_fscore value: [0.92592593 0.82758621 0.85106383 0.96153846 0.94545455 0.85106383 0.86792453 0.94117647 0.90909091 0.89285714] mean value: 0.8973681850228127 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:128: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:131: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.93305439 0.9044586 0.90187891 0.93277311 0.93473684 0.93081761 0.94142259 0.93697479 0.93501048 0.93248945] mean value: 0.9283616785563731 key: test_precision value: [0.89285714 0.77419355 0.95238095 0.96153846 0.89655172 0.95238095 0.85185185 0.96 0.86206897 0.83333333] mean value: 0.8937156932384963 key: train_precision value: [0.91769547 0.89873418 0.8852459 0.92116183 0.925 0.91735537 0.92592593 0.9253112 0.9214876 0.92468619] mean value: 0.9162603674752363 key: test_recall value: [0.96153846 0.88888889 0.76923077 0.96153846 1. 0.76923077 0.88461538 0.92307692 0.96153846 0.96153846] mean value: 0.9081196581196581 key: train_recall value: [0.94893617 0.91025641 0.91914894 0.94468085 0.94468085 0.94468085 0.95744681 0.94893617 0.94893617 0.94042553] mean value: 0.9408128750681942 key: test_roc_auc value: [0.92521368 0.80982906 0.86538462 0.96153846 0.94230769 0.86538462 0.86538462 0.94230769 0.90384615 0.88461538] mean value: 0.8965811965811966 key: train_roc_auc value: [0.93173304 0.90406438 0.9 0.93191489 0.93404255 0.92978723 0.94042553 0.93617021 0.93404255 0.93191489] mean value: 0.9274095290052737 key: test_jcc value: [0.86206897 0.70588235 0.74074074 0.92592593 0.89655172 0.74074074 0.76666667 0.88888889 0.83333333 0.80645161] mean value: 0.8167250951795871 key: train_jcc value: [0.8745098 0.8255814 0.82129278 0.87401575 0.87747036 0.87058824 0.88932806 0.88142292 0.87795276 0.87351779] mean value: 0.8665679844601714 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03287292 0.03884459 0.03730083 0.0362246 0.03666592 0.03770685 0.03795171 0.03666735 0.04778528 0.03587842] mean value: 0.03778984546661377 key: score_time value: [0.01219273 0.01410508 0.03120208 0.01247478 0.01239181 0.01472044 0.01600051 0.01480579 0.01244426 0.01247621] mean value: 0.015281367301940917 key: test_mcc value: [0.85164138 0.73997003 0.77849894 0.96225045 0.89056356 0.74466871 0.80829038 0.88527041 0.84866842 0.79056942] mean value: 0.8300391695672018 key: train_mcc value: [0.8593409 0.87640715 0.86411148 0.85113319 0.86815585 0.86395495 0.87246682 0.8597691 0.8597691 0.86386107] mean value: 0.8638969612781384 key: test_accuracy value: [0.9245283 0.86792453 0.88461538 0.98076923 0.94230769 0.86538462 0.90384615 0.94230769 0.92307692 0.88461538] mean value: 0.9119375907111756 key: train_accuracy value: [0.92963753 0.93816631 0.93191489 0.92553191 0.93404255 0.93191489 0.93617021 0.92978723 0.92978723 0.93191489] mean value: 0.9318867667740326 key: test_fscore value: [0.92592593 0.87719298 0.875 0.98113208 0.94545455 0.85106383 0.90566038 0.94117647 0.92592593 0.89655172] mean value: 0.9125083857106127 key: train_fscore value: [0.93023256 0.93842887 0.93277311 0.92600423 0.93446089 0.93248945 0.93670886 0.93052632 0.93052632 0.93162393] mean value: 0.9323774533836076 key: test_precision value: [0.89285714 0.83333333 0.95454545 0.96296296 0.89655172 0.95238095 0.88888889 0.96 0.89285714 0.8125 ] mean value: 0.9046877601963809 key: train_precision value: [0.92436975 0.93248945 0.92116183 0.92016807 0.92857143 0.92468619 0.92887029 0.92083333 0.92083333 0.93562232] mean value: 0.9257605990519295 key: test_recall value: [0.96153846 0.92592593 0.80769231 1. 1. 0.76923077 0.92307692 0.92307692 0.96153846 1. ] mean value: 0.9272079772079772 key: train_recall value: [0.93617021 0.94444444 0.94468085 0.93191489 0.94042553 0.94042553 0.94468085 0.94042553 0.94042553 0.92765957] mean value: 0.9391252955082743 key: test_roc_auc value: [0.92521368 0.86680912 0.88461538 0.98076923 0.94230769 0.86538462 0.90384615 0.94230769 0.92307692 0.88461538] mean value: 0.9118945868945869 key: train_roc_auc value: [0.92962357 0.93817967 0.93191489 0.92553191 0.93404255 0.93191489 0.93617021 0.92978723 0.92978723 0.93191489] mean value: 0.9318867066739408 key: test_jcc value: [0.86206897 0.78125 0.77777778 0.96296296 0.89655172 0.74074074 0.82758621 0.88888889 0.86206897 0.8125 ] mean value: 0.8412396232439335 key: train_jcc value: [0.86956522 0.884 0.87401575 0.86220472 0.87698413 0.87351779 0.88095238 0.87007874 0.87007874 0.872 ] mean value: 0.8733397464644983 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.99820614 0.89078689 1.11161804 0.91593313 0.99522948 1.27120042 1.16889167 0.97342539 0.93785477 0.97674584] mean value: 1.0239891767501832 key: score_time value: [0.01472306 0.01496816 0.01541257 0.01507759 0.01992369 0.01551318 0.01557875 0.0176332 0.01491976 0.01504946] mean value: 0.0158799409866333 key: test_mcc value: [0.81196581 0.8116984 0.77849894 0.96225045 0.89056356 0.77849894 0.84615385 0.84866842 0.84866842 0.82305489] mean value: 0.8400021693785612 key: train_mcc value: [0.91045482 0.91484796 0.90667855 0.89790486 0.91064654 0.90233192 0.90252815 0.90233192 0.88965172 0.90220118] mean value: 0.9039577621659811 key: test_accuracy value: [0.90566038 0.90566038 0.88461538 0.98076923 0.94230769 0.88461538 0.92307692 0.92307692 0.92307692 0.90384615] mean value: 0.9176705370101597 key: train_accuracy value: [0.95522388 0.95735608 0.95319149 0.94893617 0.95531915 0.95106383 0.95106383 0.95106383 0.94468085 0.95106383] mean value: 0.9518962936079481 key: test_fscore value: [0.90566038 0.90909091 0.875 0.98113208 0.94545455 0.875 0.92307692 0.92 0.92592593 0.9122807 ] mean value: 0.9172621458132878 key: train_fscore value: [0.95541401 0.95762712 0.95378151 0.94915254 0.95541401 0.95157895 0.95178197 0.95157895 0.94537815 0.95137421] mean value: 0.95230814229351 key: test_precision value: [0.88888889 0.89285714 0.95454545 0.96296296 0.89655172 0.95454545 0.92307692 0.95833333 0.89285714 0.83870968] mean value: 0.9163328704624589 key: train_precision value: [0.95338983 0.94957983 0.94190871 0.94514768 0.95338983 0.94166667 0.93801653 0.94166667 0.93360996 0.94537815] mean value: 0.9443753857993245 key: test_recall value: [0.92307692 0.92592593 0.80769231 1. 1. 0.80769231 0.92307692 0.88461538 0.96153846 1. ] mean value: 0.9233618233618234 key: train_recall value: [0.95744681 0.96581197 0.96595745 0.95319149 0.95744681 0.96170213 0.96595745 0.96170213 0.95744681 0.95744681] mean value: 0.9604109838152391 key: test_roc_auc value: [0.90598291 0.90527066 0.88461538 0.98076923 0.94230769 0.88461538 0.92307692 0.92307692 0.92307692 0.90384615] mean value: 0.9176638176638177 key: train_roc_auc value: [0.95521913 0.95737407 0.95319149 0.94893617 0.95531915 0.95106383 0.95106383 0.95106383 0.94468085 0.95106383] mean value: 0.9518976177486816 key: test_jcc value: [0.82758621 0.83333333 0.77777778 0.96296296 0.89655172 0.77777778 0.85714286 0.85185185 0.86206897 0.83870968] mean value: 0.848576313481764 key: train_jcc value: [0.91463415 0.91869919 0.91164659 0.90322581 0.91463415 0.90763052 0.908 0.90763052 0.89641434 0.90725806] mean value: 0.9089773323794109 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01577473 0.01172471 0.01135039 0.01113343 0.01135993 0.01071382 0.01026487 0.01151204 0.01141429 0.01121664] mean value: 0.011646485328674317 key: score_time value: [0.01278234 0.01041102 0.00982523 0.00985003 0.01006365 0.00986242 0.00955105 0.01019549 0.00995421 0.01008344] mean value: 0.010257887840270995 key: test_mcc value: [0.66048569 0.40912228 0.74466871 0.77151675 0.80829038 0.62279916 0.57735027 0.54006172 0.54006172 0.73568294] mean value: 0.6410039620540624 key: train_mcc value: [0.66639366 0.7005426 0.701239 0.68145013 0.69523029 0.7097907 0.66017245 0.67359644 0.68812845 0.69162595] mean value: 0.6868169671603843 key: test_accuracy value: [0.83018868 0.69811321 0.86538462 0.88461538 0.90384615 0.80769231 0.78846154 0.76923077 0.76923077 0.86538462] mean value: 0.8182148040638607 key: train_accuracy value: [0.8315565 0.84861407 0.84893617 0.83829787 0.84680851 0.85319149 0.82978723 0.83404255 0.84255319 0.84468085] mean value: 0.8418468448033389 key: test_fscore value: [0.82352941 0.66666667 0.85106383 0.88 0.90196078 0.79166667 0.78431373 0.77777778 0.76 0.85714286] mean value: 0.809412171960983 key: train_fscore value: [0.82326622 0.84044944 0.84116331 0.8280543 0.84140969 0.84563758 0.83333333 0.82272727 0.83482143 0.83813747] mean value: 0.8349000049484545 key: test_precision value: [0.84 0.76190476 0.95238095 0.91666667 0.92 0.86363636 0.8 0.75 0.79166667 0.91304348] mean value: 0.850929888951628 key: train_precision value: [0.86792453 0.88625592 0.88679245 0.88405797 0.87214612 0.89150943 0.81632653 0.88292683 0.87793427 0.875 ] mean value: 0.8740874061181917 key: test_recall value: [0.80769231 0.59259259 0.76923077 0.84615385 0.88461538 0.73076923 0.76923077 0.80769231 0.73076923 0.80769231] mean value: 0.7746438746438746 key: train_recall value: [0.78297872 0.7991453 0.8 0.7787234 0.81276596 0.80425532 0.85106383 0.77021277 0.79574468 0.80425532] mean value: 0.7999145299145299 key: test_roc_auc value: [0.82977208 0.70014245 0.86538462 0.88461538 0.90384615 0.80769231 0.78846154 0.76923077 0.76923077 0.86538462] mean value: 0.8183760683760685 key: train_roc_auc value: [0.8316603 0.84850882 0.84893617 0.83829787 0.84680851 0.85319149 0.82978723 0.83404255 0.84255319 0.84468085] mean value: 0.8418466993998909 key: test_jcc value: [0.7 0.5 0.74074074 0.78571429 0.82142857 0.65517241 0.64516129 0.63636364 0.61290323 0.75 ] mean value: 0.684748416416937 key: train_jcc value: [0.69961977 0.7248062 0.72586873 0.70656371 0.72623574 0.73255814 0.71428571 0.6988417 0.7164751 0.72137405] mean value: 0.7166628841540069 MCC on Blind test: 0.63 Accuracy on Blind test: 0.81 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01205468 0.0115869 0.01135087 0.01066089 0.01150608 0.01090503 0.01032591 0.01174426 0.01039815 0.01049948] mean value: 0.011103224754333497 key: score_time value: [0.01041389 0.00969172 0.0097506 0.00891495 0.00965595 0.00975871 0.00914407 0.00987816 0.00919986 0.00920725] mean value: 0.009561514854431153 key: test_mcc value: [0.73646724 0.47360961 0.65433031 0.88527041 0.69436507 0.69230769 0.65824263 0.77151675 0.69436507 0.65433031] mean value: 0.6914805091639882 key: train_mcc value: [0.73140924 0.75708961 0.72356805 0.74043224 0.76629748 0.77032436 0.70276422 0.67337154 0.73659716 0.75330062] mean value: 0.7355154534366981 key: test_accuracy value: [0.86792453 0.73584906 0.82692308 0.94230769 0.84615385 0.84615385 0.82692308 0.88461538 0.84615385 0.82692308] mean value: 0.8449927431059506 key: train_accuracy value: [0.86567164 0.87846482 0.86170213 0.87021277 0.88297872 0.88510638 0.85106383 0.83617021 0.86808511 0.87659574] mean value: 0.8676051354171392 key: test_fscore value: [0.86792453 0.73076923 0.82352941 0.94117647 0.85185185 0.84615385 0.81632653 0.88 0.85185185 0.82352941] mean value: 0.843311313365856 key: train_fscore value: [0.86509636 0.87688985 0.86021505 0.87048832 0.88469602 0.88607595 0.84782609 0.83150985 0.86580087 0.87553648] mean value: 0.8664134831445992 key: test_precision value: [0.85185185 0.76 0.84 0.96 0.82142857 0.84615385 0.86956522 0.91666667 0.82142857 0.84 ] mean value: 0.8527094724920812 key: train_precision value: [0.87068966 0.88646288 0.86956522 0.86864407 0.87190083 0.87866109 0.86666667 0.85585586 0.88105727 0.88311688] mean value: 0.873262041113066 key: test_recall value: [0.88461538 0.7037037 0.80769231 0.92307692 0.88461538 0.84615385 0.76923077 0.84615385 0.88461538 0.80769231] mean value: 0.8357549857549857 key: train_recall value: [0.85957447 0.86752137 0.85106383 0.87234043 0.89787234 0.89361702 0.82978723 0.80851064 0.85106383 0.86808511] mean value: 0.8599436261138389 key: test_roc_auc value: [0.86823362 0.73646724 0.82692308 0.94230769 0.84615385 0.84615385 0.82692308 0.88461538 0.84615385 0.82692308] mean value: 0.8450854700854701 key: train_roc_auc value: [0.86568467 0.87844153 0.86170213 0.87021277 0.88297872 0.88510638 0.85106383 0.83617021 0.86808511 0.87659574] mean value: 0.8676041098381524 key: test_jcc value: [0.76666667 0.57575758 0.7 0.88888889 0.74193548 0.73333333 0.68965517 0.78571429 0.74193548 0.7 ] mean value: 0.7323886890516479 key: train_jcc value: [0.76226415 0.78076923 0.75471698 0.77067669 0.79323308 0.79545455 0.73584906 0.71161049 0.76335878 0.77862595] mean value: 0.7646558959054925 MCC on Blind test: 0.76 Accuracy on Blind test: 0.88 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00998068 0.01181459 0.01135254 0.01113796 0.01147676 0.01108146 0.01116586 0.01172829 0.01143646 0.01152492] mean value: 0.011269950866699218 key: score_time value: [0.0131557 0.01342392 0.01350808 0.01317334 0.01755738 0.01329255 0.0138526 0.01361537 0.01634336 0.01371002] mean value: 0.01416323184967041 key: test_mcc value: [0.54793065 0.3223969 0.34641016 0.56591646 0.62279916 0.4233902 0.43929769 0.73568294 0.65824263 0.27104108] mean value: 0.49331078658150723 key: train_mcc value: [0.7231531 0.71458471 0.73192152 0.69364214 0.71917498 0.71066404 0.71495188 0.68936794 0.72008837 0.73208062] mean value: 0.7149629304142088 key: test_accuracy value: [0.77358491 0.66037736 0.67307692 0.76923077 0.80769231 0.71153846 0.71153846 0.86538462 0.82692308 0.63461538] mean value: 0.7433962264150944 key: train_accuracy value: [0.86140725 0.85714286 0.86595745 0.84680851 0.85957447 0.85531915 0.85744681 0.84468085 0.85957447 0.86595745] mean value: 0.8573869255545978 key: test_fscore value: [0.76 0.65384615 0.67924528 0.72727273 0.82142857 0.71698113 0.66666667 0.87272727 0.81632653 0.6122449 ] mean value: 0.732673923560716 key: train_fscore value: [0.85961123 0.85466377 0.86567164 0.84615385 0.85897436 0.85470085 0.85653105 0.84434968 0.8558952 0.86451613] mean value: 0.8561067762085006 key: test_precision value: [0.79166667 0.68 0.66666667 0.88888889 0.76666667 0.7037037 0.78947368 0.82758621 0.86956522 0.65217391] mean value: 0.7636391614134453 key: train_precision value: [0.87280702 0.86784141 0.86752137 0.84978541 0.86266094 0.8583691 0.86206897 0.84615385 0.87892377 0.87391304] mean value: 0.8640044867366126 key: test_recall value: [0.73076923 0.62962963 0.69230769 0.61538462 0.88461538 0.73076923 0.57692308 0.92307692 0.76923077 0.57692308] mean value: 0.7129629629629629 key: train_recall value: [0.84680851 0.84188034 0.86382979 0.84255319 0.85531915 0.85106383 0.85106383 0.84255319 0.83404255 0.85531915] mean value: 0.8484433533369704 key: test_roc_auc value: [0.77279202 0.66096866 0.67307692 0.76923077 0.80769231 0.71153846 0.71153846 0.86538462 0.82692308 0.63461538] mean value: 0.7433760683760684 key: train_roc_auc value: [0.86143844 0.85711038 0.86595745 0.84680851 0.85957447 0.85531915 0.85744681 0.84468085 0.85957447 0.86595745] mean value: 0.8573867975995636 key: test_jcc value: [0.61290323 0.48571429 0.51428571 0.57142857 0.6969697 0.55882353 0.5 0.77419355 0.68965517 0.44117647] mean value: 0.584515021500561 key: train_jcc value: [0.75378788 0.74621212 0.76315789 0.73333333 0.75280899 0.74626866 0.74906367 0.73062731 0.7480916 0.76136364] mean value: 0.7484715089652758 MCC on Blind test: 0.33 Accuracy on Blind test: 0.67 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02623487 0.02498984 0.02270436 0.02161646 0.02071953 0.02152848 0.02124238 0.02357602 0.02255344 0.02224708] mean value: 0.022741246223449706 key: score_time value: [0.0136168 0.01234031 0.01236081 0.01204133 0.01282573 0.01181722 0.01200223 0.01174998 0.01184368 0.01281047] mean value: 0.012340855598449708 key: test_mcc value: [0.81196581 0.69957726 0.77849894 0.92307692 0.84866842 0.77849894 0.73131034 0.88527041 0.81312325 0.73568294] mean value: 0.800567324577806 key: train_mcc value: [0.7995781 0.81236588 0.80451759 0.78726255 0.79574468 0.80451759 0.80428445 0.79155386 0.80000724 0.80851796] mean value: 0.8008349901879068 key: test_accuracy value: [0.90566038 0.8490566 0.88461538 0.96153846 0.92307692 0.88461538 0.86538462 0.94230769 0.90384615 0.86538462] mean value: 0.8985486211901307 key: train_accuracy value: [0.89978678 0.90618337 0.90212766 0.89361702 0.89787234 0.90212766 0.90212766 0.89574468 0.9 0.90425532] mean value: 0.9003842489679263 key: test_fscore value: [0.90566038 0.85714286 0.875 0.96153846 0.92592593 0.875 0.86792453 0.94117647 0.90909091 0.87272727] mean value: 0.8991186802674039 key: train_fscore value: [0.90021231 0.90598291 0.90336134 0.8940678 0.89787234 0.90336134 0.90254237 0.89640592 0.89978678 0.90405117] mean value: 0.9007644291954064 key: test_precision value: [0.88888889 0.82758621 0.95454545 0.96153846 0.89285714 0.95454545 0.85185185 0.96 0.86206897 0.82758621] mean value: 0.8981468633537599 key: train_precision value: [0.89830508 0.90598291 0.89211618 0.89029536 0.89787234 0.89211618 0.89873418 0.8907563 0.9017094 0.90598291] mean value: 0.8973870842377724 key: test_recall value: [0.92307692 0.88888889 0.80769231 0.96153846 0.96153846 0.80769231 0.88461538 0.92307692 0.96153846 0.92307692] mean value: 0.9042735042735043 key: train_recall value: [0.90212766 0.90598291 0.91489362 0.89787234 0.89787234 0.91489362 0.90638298 0.90212766 0.89787234 0.90212766] mean value: 0.9042153118748864 key: test_roc_auc value: [0.90598291 0.8482906 0.88461538 0.96153846 0.92307692 0.88461538 0.86538462 0.94230769 0.90384615 0.86538462] mean value: 0.8985042735042735 key: train_roc_auc value: [0.89978178 0.90618294 0.90212766 0.89361702 0.89787234 0.90212766 0.90212766 0.89574468 0.9 0.90425532] mean value: 0.9003837061283869 key: test_jcc value: [0.82758621 0.75 0.77777778 0.92592593 0.86206897 0.77777778 0.76666667 0.88888889 0.83333333 0.77419355] mean value: 0.818421909117126 key: train_jcc value: [0.81853282 0.828125 0.82375479 0.80842912 0.81467181 0.82375479 0.82239382 0.81226054 0.81782946 0.82490272] mean value: 0.819465487041468 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.69608831 1.78880715 2.0408814 2.11449838 1.90394425 1.97782683 1.97935748 2.07013917 2.04062533 2.04656172] mean value: 1.9658730030059814 key: score_time value: [0.01268101 0.01269102 0.01288629 0.01712489 0.01619506 0.01270795 0.01263285 0.01373196 0.01519346 0.01510119] mean value: 0.01409456729888916 key: test_mcc value: [0.77350427 0.8116984 0.74466871 0.96225045 0.85634884 0.73568294 0.84866842 0.84866842 0.84615385 0.82305489] mean value: 0.825069919863237 key: train_mcc value: [0.97037106 0.98721563 0.99148936 0.9873145 1. 0.9957537 0.9957537 1. 0.98312115 0.9957537 ] mean value: 0.9906772783362988 key: test_accuracy value: [0.88679245 0.90566038 0.86538462 0.98076923 0.92307692 0.86538462 0.92307692 0.92307692 0.92307692 0.90384615] mean value: 0.9100145137880987 key: train_accuracy value: [0.98507463 0.99360341 0.99574468 0.99361702 1. 0.99787234 0.99787234 1. 0.99148936 0.99787234] mean value: 0.9953146123485914 key: test_fscore value: [0.88461538 0.90909091 0.85106383 0.98113208 0.92857143 0.85714286 0.92 0.92 0.92307692 0.9122807 ] mean value: 0.908697410951082 key: train_fscore value: [0.98494624 0.99357602 0.99574468 0.99357602 1. 0.9978678 0.9978678 1. 0.99141631 0.99787686] mean value: 0.9952871726109697 key: test_precision value: [0.88461538 0.89285714 0.95238095 0.96296296 0.86666667 0.91304348 0.95833333 0.95833333 0.92307692 0.83870968] mean value: 0.9150979854906923 key: train_precision value: [0.99565217 0.99570815 0.99574468 1. 1. 1. 1. 1. 1. 0.99576271] mean value: 0.9982867721134951 key: test_recall value: [0.88461538 0.92592593 0.76923077 1. 1. 0.80769231 0.88461538 0.88461538 0.92307692 1. ] mean value: 0.9079772079772079 key: train_recall value: [0.97446809 0.99145299 0.99574468 0.98723404 1. 0.99574468 0.99574468 1. 0.98297872 1. ] mean value: 0.9923367885070012 key: test_roc_auc value: [0.88675214 0.90527066 0.86538462 0.98076923 0.92307692 0.86538462 0.92307692 0.92307692 0.92307692 0.90384615] mean value: 0.90997150997151 key: train_roc_auc value: [0.98509729 0.99359884 0.99574468 0.99361702 1. 0.99787234 0.99787234 1. 0.99148936 0.99787234] mean value: 0.995316421167485 key: test_jcc value: [0.79310345 0.83333333 0.74074074 0.96296296 0.86666667 0.75 0.85185185 0.85185185 0.85714286 0.83870968] mean value: 0.8346363390245481 key: train_jcc value: [0.97033898 0.98723404 0.99152542 0.98723404 1. 0.99574468 0.99574468 1. 0.98297872 0.99576271] mean value: 0.9906563288856833 MCC on Blind test: 0.73 Accuracy on Blind test: 0.86 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02953148 0.02271771 0.02113962 0.02237082 0.01985717 0.02250838 0.02102804 0.02116036 0.02456594 0.02486062] mean value: 0.022974014282226562 key: score_time value: [0.01245403 0.00974846 0.00945234 0.00895834 0.00919867 0.00918245 0.00941133 0.00912023 0.00937629 0.01030064] mean value: 0.009720277786254884 key: test_mcc value: [0.81688878 0.92704716 0.92307692 0.88527041 0.84866842 0.96225045 0.84615385 0.84866842 0.77151675 1. ] mean value: 0.8829541177251579 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.90566038 0.96226415 0.96153846 0.94230769 0.92307692 0.98076923 0.92307692 0.92307692 0.88461538 1. ] mean value: 0.9406386066763426 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90909091 0.96428571 0.96153846 0.94339623 0.92592593 0.98039216 0.92307692 0.92592593 0.88888889 1. ] mean value: 0.9422521132010588 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.86206897 0.93103448 0.96153846 0.92592593 0.89285714 1. 0.92307692 0.89285714 0.85714286 1. ] mean value: 0.9246501901674316 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96153846 1. 0.96153846 0.96153846 0.96153846 0.96153846 0.92307692 0.96153846 0.92307692 1. ] mean value: 0.9615384615384616 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90669516 0.96153846 0.96153846 0.94230769 0.92307692 0.98076923 0.92307692 0.92307692 0.88461538 1. ] mean value: 0.9406695156695157 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83333333 0.93103448 0.92592593 0.89285714 0.86206897 0.96153846 0.85714286 0.86206897 0.8 1. ] mean value: 0.8925970134590824 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.87 Accuracy on Blind test: 0.93 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.12382007 0.12043929 0.12239695 0.12167001 0.12279487 0.1219244 0.12101841 0.12090349 0.12148833 0.12095571] mean value: 0.12174115180969239 key: score_time value: [0.01793957 0.01816487 0.01760411 0.01803041 0.0179038 0.01825953 0.01798344 0.01780105 0.01787972 0.01786542] mean value: 0.017943191528320312 key: test_mcc value: [0.77603503 0.66096866 0.77151675 0.88527041 0.88527041 0.81312325 0.76923077 0.92307692 0.89056356 0.71151247] mean value: 0.8086568232416546 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.88679245 0.83018868 0.88461538 0.94230769 0.94230769 0.90384615 0.88461538 0.96153846 0.94230769 0.84615385] mean value: 0.902467343976778 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.88888889 0.83018868 0.88 0.94117647 0.94339623 0.89795918 0.88461538 0.96153846 0.94545455 0.86206897] mean value: 0.9035286805936604 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.85714286 0.84615385 0.91666667 0.96 0.92592593 0.95652174 0.88461538 0.96153846 0.89655172 0.78125 ] mean value: 0.8986366605311508 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.92307692 0.81481481 0.84615385 0.92307692 0.96153846 0.84615385 0.88461538 0.96153846 1. 0.96153846] mean value: 0.9122507122507123 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.88746439 0.83048433 0.88461538 0.94230769 0.94230769 0.90384615 0.88461538 0.96153846 0.94230769 0.84615385] mean value: 0.9025641025641026 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.8 0.70967742 0.78571429 0.88888889 0.89285714 0.81481481 0.79310345 0.92592593 0.89655172 0.75757576] mean value: 0.8265109407545448 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.81 Accuracy on Blind test: 0.9 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.0100944 0.01009727 0.01013374 0.01006222 0.01016641 0.01010799 0.01006365 0.01008892 0.01019859 0.01012897] mean value: 0.010114216804504394 key: score_time value: [0.00880814 0.00876117 0.00883174 0.0087719 0.00877357 0.00879431 0.00871038 0.00887156 0.00874829 0.0087359 ] mean value: 0.008780694007873536 key: test_mcc value: [ 0.43447293 0.43366663 0.4233902 0.50336201 0.6172134 0.46291005 0.43929769 0.73568294 0.54006172 -0.08084521] mean value: 0.4509212363913005 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.71698113 0.71698113 0.71153846 0.75 0.80769231 0.73076923 0.71153846 0.86538462 0.76923077 0.46153846] mean value: 0.7241654571843251 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.71698113 0.72727273 0.70588235 0.76363636 0.81481481 0.74074074 0.74576271 0.87272727 0.77777778 0.53333333] mean value: 0.7398929227184086 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.7037037 0.71428571 0.72 0.72413793 0.78571429 0.71428571 0.66666667 0.82758621 0.75 0.47058824] mean value: 0.7076968457881236 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.73076923 0.74074074 0.69230769 0.80769231 0.84615385 0.76923077 0.84615385 0.92307692 0.80769231 0.61538462] mean value: 0.777920227920228 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.71723647 0.71652422 0.71153846 0.75 0.80769231 0.73076923 0.71153846 0.86538462 0.76923077 0.46153846] mean value: 0.7241452991452991 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.55882353 0.57142857 0.54545455 0.61764706 0.6875 0.58823529 0.59459459 0.77419355 0.63636364 0.36363636] mean value: 0.5937877142217749 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.58 Accuracy on Blind test: 0.79 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.76943898 1.83275151 1.84324455 1.89071012 1.83631063 1.77790737 1.78165555 1.7913177 1.78348899 1.77847815] mean value: 1.8085303544998168 key: score_time value: [0.09400439 0.09472203 0.10127854 0.10282397 0.09269404 0.09585142 0.09383512 0.09480047 0.15068531 0.09217691] mean value: 0.10128722190856934 key: test_mcc value: [0.81688878 0.92450142 0.9258201 0.92307692 0.9258201 0.9258201 0.9258201 0.96225045 0.9258201 0.9258201 ] mean value: 0.9181638177359281 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.90566038 0.96226415 0.96153846 0.96153846 0.96153846 0.96153846 0.96153846 0.98076923 0.96153846 0.96153846] mean value: 0.9579462989840348 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90909091 0.96296296 0.96 0.96153846 0.96296296 0.96 0.96 0.98039216 0.96296296 0.96296296] mean value: 0.9582873379343968 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.86206897 0.96296296 1. 0.96153846 0.92857143 1. 1. 1. 0.92857143 0.92857143] mean value: 0.9572284675732952 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96153846 0.96296296 0.92307692 0.96153846 1. 0.92307692 0.92307692 0.96153846 1. 1. ] mean value: 0.9616809116809117 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90669516 0.96225071 0.96153846 0.96153846 0.96153846 0.96153846 0.96153846 0.98076923 0.96153846 0.96153846] mean value: 0.9580484330484331 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83333333 0.92857143 0.92307692 0.92592593 0.92857143 0.92307692 0.92307692 0.96153846 0.92857143 0.92857143] mean value: 0.9204314204314205 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.81 Accuracy on Blind test: 0.9 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.92528653 0.9376781 0.96513009 1.02676868 1.00038338 1.00743413 0.93951488 1.03116369 0.93464684 0.97854924] mean value: 0.974655556678772 key: score_time value: [0.26444244 0.20015526 0.24083567 0.22042966 0.2193284 0.27657557 0.22205067 0.23441887 0.12471294 0.23463321] mean value: 0.22375826835632323 key: test_mcc value: [0.81688878 0.77350427 0.9258201 0.92307692 0.9258201 0.88527041 0.9258201 0.96225045 0.9258201 0.9258201 ] mean value: 0.8990091339347005 key: train_mcc value: [0.96162939 0.95309971 0.95744681 0.95320012 0.95748148 0.95320012 0.95744681 0.95320012 0.94893617 0.95748148] mean value: 0.9553122213642232 key: test_accuracy value: [0.90566038 0.88679245 0.96153846 0.96153846 0.96153846 0.94230769 0.96153846 0.98076923 0.96153846 0.96153846] mean value: 0.9484760522496372 key: train_accuracy value: [0.98081023 0.97654584 0.9787234 0.97659574 0.9787234 0.97659574 0.9787234 0.97659574 0.97446809 0.9787234 ] mean value: 0.9776505012929274 key: test_fscore value: [0.90909091 0.88888889 0.96 0.96153846 0.96296296 0.94117647 0.96 0.98039216 0.96296296 0.96296296] mean value: 0.9489975775858129 key: train_fscore value: [0.98081023 0.9764454 0.9787234 0.97654584 0.97863248 0.97654584 0.9787234 0.97654584 0.97446809 0.97863248] mean value: 0.9776073008221619 key: test_precision value: [0.86206897 0.88888889 1. 0.96153846 0.92857143 0.96 1. 1. 0.92857143 0.92857143] mean value: 0.9458210601658877 key: train_precision value: [0.98290598 0.97854077 0.9787234 0.97863248 0.98283262 0.97863248 0.9787234 0.97863248 0.97446809 0.98283262] mean value: 0.979492432100413 key: test_recall value: [0.96153846 0.88888889 0.92307692 0.96153846 1. 0.92307692 0.92307692 0.96153846 1. 1. ] mean value: 0.9542735042735043 key: train_recall value: [0.9787234 0.97435897 0.9787234 0.97446809 0.97446809 0.97446809 0.9787234 0.97446809 0.97446809 0.97446809] mean value: 0.975733769776323 key: test_roc_auc value: [0.90669516 0.88675214 0.96153846 0.96153846 0.96153846 0.94230769 0.96153846 0.98076923 0.96153846 0.96153846] mean value: 0.9485754985754986 key: train_roc_auc value: [0.98081469 0.97654119 0.9787234 0.97659574 0.9787234 0.97659574 0.9787234 0.97659574 0.97446809 0.9787234 ] mean value: 0.977650481905801 key: test_jcc value: [0.83333333 0.8 0.92307692 0.92592593 0.92857143 0.88888889 0.92307692 0.96153846 0.92857143 0.92857143] mean value: 0.9041554741554741 key: train_jcc value: [0.9623431 0.9539749 0.95833333 0.95416667 0.958159 0.95416667 0.95833333 0.95416667 0.95020747 0.958159 ] mean value: 0.9562010118809934 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01119733 0.01217985 0.01093841 0.01117277 0.01136184 0.0108006 0.01067758 0.01132941 0.01079106 0.01140022] mean value: 0.011184906959533692 key: score_time value: [0.00896358 0.00987315 0.00957155 0.00922394 0.00965858 0.00929475 0.0097177 0.00967407 0.00959873 0.00960207] mean value: 0.009517812728881836 key: test_mcc value: [0.73646724 0.47360961 0.65433031 0.88527041 0.69436507 0.69230769 0.65824263 0.77151675 0.69436507 0.65433031] mean value: 0.6914805091639882 key: train_mcc value: [0.73140924 0.75708961 0.72356805 0.74043224 0.76629748 0.77032436 0.70276422 0.67337154 0.73659716 0.75330062] mean value: 0.7355154534366981 key: test_accuracy value: [0.86792453 0.73584906 0.82692308 0.94230769 0.84615385 0.84615385 0.82692308 0.88461538 0.84615385 0.82692308] mean value: 0.8449927431059506 key: train_accuracy value: [0.86567164 0.87846482 0.86170213 0.87021277 0.88297872 0.88510638 0.85106383 0.83617021 0.86808511 0.87659574] mean value: 0.8676051354171392 key: test_fscore value: [0.86792453 0.73076923 0.82352941 0.94117647 0.85185185 0.84615385 0.81632653 0.88 0.85185185 0.82352941] mean value: 0.843311313365856 key: train_fscore value: [0.86509636 0.87688985 0.86021505 0.87048832 0.88469602 0.88607595 0.84782609 0.83150985 0.86580087 0.87553648] mean value: 0.8664134831445992 key: test_precision value: [0.85185185 0.76 0.84 0.96 0.82142857 0.84615385 0.86956522 0.91666667 0.82142857 0.84 ] mean value: 0.8527094724920812 key: train_precision value: [0.87068966 0.88646288 0.86956522 0.86864407 0.87190083 0.87866109 0.86666667 0.85585586 0.88105727 0.88311688] mean value: 0.873262041113066 key: test_recall value: [0.88461538 0.7037037 0.80769231 0.92307692 0.88461538 0.84615385 0.76923077 0.84615385 0.88461538 0.80769231] mean value: 0.8357549857549857 key: train_recall value: [0.85957447 0.86752137 0.85106383 0.87234043 0.89787234 0.89361702 0.82978723 0.80851064 0.85106383 0.86808511] mean value: 0.8599436261138389 key: test_roc_auc value: [0.86823362 0.73646724 0.82692308 0.94230769 0.84615385 0.84615385 0.82692308 0.88461538 0.84615385 0.82692308] mean value: 0.8450854700854701 key: train_roc_auc value: [0.86568467 0.87844153 0.86170213 0.87021277 0.88297872 0.88510638 0.85106383 0.83617021 0.86808511 0.87659574] mean value: 0.8676041098381524 key: test_jcc value: [0.76666667 0.57575758 0.7 0.88888889 0.74193548 0.73333333 0.68965517 0.78571429 0.74193548 0.7 ] mean value: 0.7323886890516479 key: train_jcc value: [0.76226415 0.78076923 0.75471698 0.77067669 0.79323308 0.79545455 0.73584906 0.71161049 0.76335878 0.77862595] mean value: 0.7646558959054925 MCC on Blind test: 0.76 Accuracy on Blind test: 0.88 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.08577728 0.07616615 0.09151053 0.07672358 0.0770278 0.08412862 0.06949353 0.0763917 0.08024454 0.08369136] mean value: 0.08011550903320312 key: score_time value: [0.01101613 0.01094556 0.0117805 0.01137733 0.0115149 0.01199055 0.01296544 0.01189756 0.01115489 0.0111413 ] mean value: 0.01157841682434082 key: test_mcc value: [0.88746439 0.96291111 0.96225045 0.96225045 0.9258201 0.96225045 0.88527041 0.9258201 0.9258201 0.96225045] mean value: 0.9362108002286528 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94339623 0.98113208 0.98076923 0.98076923 0.96153846 0.98076923 0.94230769 0.96153846 0.96153846 0.98076923] mean value: 0.9674528301886792 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94339623 0.98181818 0.98039216 0.98113208 0.96296296 0.98039216 0.94117647 0.96 0.96296296 0.98113208] mean value: 0.9675365269416324 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.92592593 0.96428571 1. 0.96296296 0.92857143 1. 0.96 1. 0.92857143 0.96296296] mean value: 0.9633280423280424 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96153846 1. 0.96153846 1. 1. 0.96153846 0.92307692 0.92307692 1. 1. ] mean value: 0.9730769230769231 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94373219 0.98076923 0.98076923 0.98076923 0.96153846 0.98076923 0.94230769 0.96153846 0.96153846 0.98076923] mean value: 0.9674501424501425 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.89285714 0.96428571 0.96153846 0.96296296 0.92857143 0.96153846 0.88888889 0.92307692 0.92857143 0.96296296] mean value: 0.9375254375254375 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.86 Accuracy on Blind test: 0.93 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04911208 0.07845759 0.0794065 0.07849884 0.07238936 0.04618835 0.09146833 0.09963346 0.07126188 0.0715549 ] mean value: 0.07379713058471679 key: score_time value: [0.01892233 0.0187223 0.01950288 0.0187819 0.01244593 0.01247454 0.01240921 0.01876998 0.02508521 0.01220512] mean value: 0.016931939125061034 key: test_mcc value: [0.77603503 0.73997003 0.77849894 0.88527041 0.84866842 0.81312325 0.6172134 0.80829038 0.74466871 0.71151247] mean value: 0.7723251042423636 key: train_mcc value: [0.89794254 0.89379475 0.91104256 0.90651431 0.91922384 0.91084449 0.91084449 0.91502618 0.91519196 0.90641581] mean value: 0.9086840926765474 key: test_accuracy value: [0.88679245 0.86792453 0.88461538 0.94230769 0.92307692 0.90384615 0.80769231 0.90384615 0.86538462 0.84615385] mean value: 0.8831640058055152 key: train_accuracy value: [0.94882729 0.9466951 0.95531915 0.95319149 0.95957447 0.95531915 0.95531915 0.95744681 0.95744681 0.95319149] mean value: 0.9542330898697999 key: test_fscore value: [0.88888889 0.87719298 0.875 0.94339623 0.92592593 0.89795918 0.81481481 0.90196078 0.87719298 0.86206897] mean value: 0.8864400754461441 key: train_fscore value: [0.94957983 0.94736842 0.95597484 0.9535865 0.95983087 0.95578947 0.95578947 0.95780591 0.95798319 0.95338983] mean value: 0.9547098338777809 key: test_precision value: [0.85714286 0.83333333 0.95454545 0.92592593 0.89285714 0.95652174 0.78571429 0.92 0.80645161 0.78125 ] mean value: 0.871374235155266 key: train_precision value: [0.93775934 0.93360996 0.94214876 0.94560669 0.95378151 0.94583333 0.94583333 0.94979079 0.94605809 0.94936709] mean value: 0.9449788903641747 key: test_recall value: [0.92307692 0.92592593 0.80769231 0.96153846 0.96153846 0.84615385 0.84615385 0.88461538 0.96153846 0.96153846] mean value: 0.9079772079772079 key: train_recall value: [0.96170213 0.96153846 0.97021277 0.96170213 0.96595745 0.96595745 0.96595745 0.96595745 0.97021277 0.95744681] mean value: 0.9646644844517185 key: test_roc_auc value: [0.88746439 0.86680912 0.88461538 0.94230769 0.92307692 0.90384615 0.80769231 0.90384615 0.86538462 0.84615385] mean value: 0.8831196581196582 key: train_roc_auc value: [0.94879978 0.94672668 0.95531915 0.95319149 0.95957447 0.95531915 0.95531915 0.95744681 0.95744681 0.95319149] mean value: 0.9542334969994545 key: test_jcc value: [0.8 0.78125 0.77777778 0.89285714 0.86206897 0.81481481 0.6875 0.82142857 0.78125 0.75757576] mean value: 0.7976523029971305 key: train_jcc value: [0.904 0.9 0.91566265 0.91129032 0.92276423 0.91532258 0.91532258 0.91902834 0.91935484 0.91093117] mean value: 0.9133676714995371 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01811981 0.0114007 0.00968623 0.00974464 0.01074147 0.00977039 0.01065302 0.01098585 0.00996947 0.00996566] mean value: 0.01110372543334961 key: score_time value: [0.0099473 0.00906754 0.00856543 0.00886488 0.00860167 0.00880289 0.00895095 0.00944519 0.00875688 0.00867391] mean value: 0.00896766185760498 key: test_mcc value: [0.6980057 0.51359557 0.81312325 0.84615385 0.84615385 0.70064905 0.65824263 0.65433031 0.73568294 0.6172134 ] mean value: 0.7083150533360429 key: train_mcc value: [0.68508531 0.71894691 0.70253486 0.69424587 0.74910575 0.74478875 0.69401929 0.67747959 0.69041892 0.71541847] mean value: 0.7072043720898497 key: test_accuracy value: [0.8490566 0.75471698 0.90384615 0.92307692 0.92307692 0.84615385 0.82692308 0.82692308 0.86538462 0.80769231] mean value: 0.8526850507982584 key: train_accuracy value: [0.84221748 0.85927505 0.85106383 0.84680851 0.87446809 0.87234043 0.84680851 0.83829787 0.84468085 0.85744681] mean value: 0.8533407430930454 key: test_fscore value: [0.84615385 0.74509804 0.89795918 0.92307692 0.92307692 0.83333333 0.81632653 0.82352941 0.87272727 0.8 ] mean value: 0.8481281463634405 key: train_fscore value: [0.83913043 0.85652174 0.84848485 0.84347826 0.87311828 0.87124464 0.84415584 0.83406114 0.84026258 0.85466377] mean value: 0.850512153401787 key: test_precision value: [0.84615385 0.79166667 0.95652174 0.92307692 0.92307692 0.90909091 0.86956522 0.84 0.82758621 0.83333333] mean value: 0.8720071764816892 key: train_precision value: [0.85777778 0.87168142 0.86343612 0.86222222 0.8826087 0.87878788 0.85903084 0.85650224 0.86486486 0.87168142] mean value: 0.8668593473668214 key: test_recall value: [0.84615385 0.7037037 0.84615385 0.92307692 0.92307692 0.76923077 0.76923077 0.80769231 0.92307692 0.76923077] mean value: 0.8280626780626781 key: train_recall value: [0.8212766 0.84188034 0.83404255 0.82553191 0.86382979 0.86382979 0.82978723 0.81276596 0.81702128 0.83829787] mean value: 0.8348263320603746 key: test_roc_auc value: [0.84900285 0.75569801 0.90384615 0.92307692 0.92307692 0.84615385 0.82692308 0.82692308 0.86538462 0.80769231] mean value: 0.8527777777777779 key: train_roc_auc value: [0.84226223 0.85923804 0.85106383 0.84680851 0.87446809 0.87234043 0.84680851 0.83829787 0.84468085 0.85744681] mean value: 0.853341516639389 key: test_jcc value: [0.73333333 0.59375 0.81481481 0.85714286 0.85714286 0.71428571 0.68965517 0.7 0.77419355 0.66666667] mean value: 0.7400984964187133 key: train_jcc value: [0.72284644 0.74904943 0.73684211 0.72932331 0.77480916 0.77186312 0.73033708 0.71535581 0.7245283 0.74621212] mean value: 0.7401166870309306 MCC on Blind test: 0.76 Accuracy on Blind test: 0.88 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01385999 0.0178628 0.02119446 0.02347088 0.0253675 0.01830435 0.02012086 0.01882529 0.02110195 0.020648 ] mean value: 0.020075607299804687 key: score_time value: [0.00991964 0.01136136 0.01177001 0.01195335 0.01180339 0.01197219 0.01181316 0.01186776 0.01188731 0.01246667] mean value: 0.011681485176086425 key: test_mcc value: [0.73609205 0.70527596 0.74466871 0.88527041 0.82305489 0.64676167 0.74466871 0.82305489 0.77151675 0.66666667] mean value: 0.7547030706307085 key: train_mcc value: [0.86611567 0.88176453 0.88164966 0.91163756 0.87947498 0.84270412 0.86448019 0.76515574 0.91064654 0.74380085] mean value: 0.8547429840717615 key: test_accuracy value: [0.86792453 0.8490566 0.86538462 0.94230769 0.90384615 0.80769231 0.86538462 0.90384615 0.88461538 0.80769231] mean value: 0.8697750362844703 key: train_accuracy value: [0.93176972 0.94029851 0.94042553 0.95531915 0.93829787 0.91914894 0.92978723 0.87234043 0.95531915 0.85957447] mean value: 0.9242280996234632 key: test_fscore value: [0.8627451 0.86206897 0.85106383 0.94117647 0.9122807 0.77272727 0.85106383 0.89361702 0.88888889 0.83870968] mean value: 0.8674341755785658 key: train_fscore value: [0.92920354 0.94166667 0.93913043 0.95424837 0.9406953 0.91479821 0.9258427 0.85576923 0.95541401 0.8754717 ] mean value: 0.9232240148337406 key: test_precision value: [0.88 0.80645161 0.95238095 0.96 0.83870968 0.94444444 0.95238095 1. 0.85714286 0.72222222] mean value: 0.8913732718894009 key: train_precision value: [0.96774194 0.91869919 0.96 0.97767857 0.90551181 0.96682464 0.98095238 0.98342541 0.95338983 0.78644068] mean value: 0.9400664453269295 key: test_recall value: [0.84615385 0.92592593 0.76923077 0.92307692 1. 0.65384615 0.76923077 0.80769231 0.92307692 1. ] mean value: 0.8618233618233618 key: train_recall value: [0.89361702 0.96581197 0.91914894 0.93191489 0.9787234 0.86808511 0.87659574 0.75744681 0.95744681 0.98723404] mean value: 0.9136024731769412 key: test_roc_auc value: [0.86752137 0.84757835 0.86538462 0.94230769 0.90384615 0.80769231 0.86538462 0.90384615 0.88461538 0.80769231] mean value: 0.8695868945868945 key: train_roc_auc value: [0.93185125 0.94035279 0.94042553 0.95531915 0.93829787 0.91914894 0.92978723 0.87234043 0.95531915 0.85957447] mean value: 0.9242416803055101 key: test_jcc value: [0.75862069 0.75757576 0.74074074 0.88888889 0.83870968 0.62962963 0.74074074 0.80769231 0.8 0.72222222] mean value: 0.7684820654564815 key: train_jcc value: [0.8677686 0.88976378 0.8852459 0.9125 0.88803089 0.84297521 0.86192469 0.74789916 0.91463415 0.77852349] mean value: 0.8589265852981367 MCC on Blind test: 0.71 Accuracy on Blind test: 0.83 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01879811 0.01746774 0.02222395 0.02218533 0.02078748 0.01897097 0.02105546 0.02173853 0.02300715 0.02159905] mean value: 0.020783376693725587 key: score_time value: [0.01111412 0.01320291 0.01229048 0.01212955 0.0149734 0.01178837 0.01732373 0.01178002 0.01172805 0.0117414 ] mean value: 0.012807202339172364 key: test_mcc value: [0.85164138 0.65110205 0.60697698 0.85634884 0.80829038 0.61494005 0.77849894 0.84866842 0.79056942 0.73131034] mean value: 0.7538346799690747 key: train_mcc value: [0.88621044 0.79855158 0.65963501 0.84046667 0.82974725 0.83758899 0.87436938 0.91519196 0.87093638 0.88085106] mean value: 0.8393548746757886 key: test_accuracy value: [0.9245283 0.81132075 0.76923077 0.92307692 0.90384615 0.78846154 0.88461538 0.92307692 0.88461538 0.86538462] mean value: 0.8678156748911466 key: train_accuracy value: [0.9424307 0.89339019 0.80638298 0.91702128 0.90851064 0.91489362 0.93617021 0.95744681 0.93404255 0.94042553] mean value: 0.9150714512543665 key: test_fscore value: [0.92592593 0.83870968 0.7 0.92857143 0.90196078 0.74418605 0.875 0.92 0.89655172 0.86792453] mean value: 0.8598830115181881 key: train_fscore value: [0.94409938 0.9015748 0.76240209 0.92184369 0.8997669 0.9086758 0.9339207 0.95689655 0.93660532 0.94042553] mean value: 0.9106210762491109 key: test_precision value: [0.89285714 0.74285714 1. 0.86666667 0.92 0.94117647 0.95454545 0.95833333 0.8125 0.85185185] mean value: 0.8940788062699827 key: train_precision value: [0.91935484 0.83576642 0.98648649 0.87121212 0.99484536 0.98029557 0.96803653 0.96943231 0.9015748 0.94042553] mean value: 0.93674299762485 key: test_recall value: [0.96153846 0.96296296 0.53846154 1. 0.88461538 0.61538462 0.80769231 0.88461538 1. 0.88461538] mean value: 0.853988603988604 key: train_recall value: [0.97021277 0.97863248 0.6212766 0.9787234 0.8212766 0.84680851 0.90212766 0.94468085 0.97446809 0.94042553] mean value: 0.8978632478632479 key: test_roc_auc value: [0.92521368 0.80840456 0.76923077 0.92307692 0.90384615 0.78846154 0.88461538 0.92307692 0.88461538 0.86538462] mean value: 0.8675925925925926 key: train_roc_auc value: [0.94237134 0.89357156 0.80638298 0.91702128 0.90851064 0.91489362 0.93617021 0.95744681 0.93404255 0.94042553] mean value: 0.9150836515730132 key: test_jcc value: [0.86206897 0.72222222 0.53846154 0.86666667 0.82142857 0.59259259 0.77777778 0.85185185 0.8125 0.76666667] mean value: 0.7612236853185129 key: train_jcc value: [0.89411765 0.82078853 0.61603376 0.85501859 0.81779661 0.83263598 0.87603306 0.91735537 0.88076923 0.8875502 ] mean value: 0.839809897491723 MCC on Blind test: 0.73 Accuracy on Blind test: 0.86 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.18865156 0.17809844 0.17845082 0.18266034 0.1782515 0.17729211 0.17885733 0.17905545 0.18050551 0.18259931] mean value: 0.1804422378540039 key: score_time value: [0.01532364 0.01549387 0.01535821 0.01534915 0.01540208 0.01566911 0.01548195 0.01556087 0.01541352 0.01596475] mean value: 0.015501713752746582 key: test_mcc value: [0.8116984 0.92450142 0.96225045 0.96225045 0.9258201 0.96225045 0.81312325 0.9258201 0.96225045 0.92307692] mean value: 0.9173041989812138 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.90566038 0.96226415 0.98076923 0.98076923 0.96153846 0.98076923 0.90384615 0.96153846 0.98076923 0.96153846] mean value: 0.9579462989840348 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90196078 0.96296296 0.98039216 0.98113208 0.96296296 0.98039216 0.89795918 0.96 0.98113208 0.96153846] mean value: 0.9570432820120469 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.92 0.96296296 1. 0.96296296 0.92857143 1. 0.95652174 1. 0.96296296 0.96153846] mean value: 0.9655520518129214 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88461538 0.96296296 0.96153846 1. 1. 0.96153846 0.84615385 0.92307692 1. 0.96153846] mean value: 0.9501424501424501 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90527066 0.96225071 0.98076923 0.98076923 0.96153846 0.98076923 0.90384615 0.96153846 0.98076923 0.96153846] mean value: 0.957905982905983 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.82142857 0.92857143 0.96153846 0.96296296 0.92857143 0.96153846 0.81481481 0.92307692 0.96296296 0.92592593] mean value: 0.9191391941391941 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.95 Accuracy on Blind test: 0.98 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.06805992 0.06910896 0.06634355 0.0792799 0.0696907 0.07058072 0.088202 0.07411814 0.07483792 0.07089567] mean value: 0.07311174869537354 key: score_time value: [0.02007461 0.0277791 0.03617716 0.03125453 0.03761816 0.03880787 0.03588963 0.03868651 0.02243757 0.03866386] mean value: 0.032738900184631346 key: test_mcc value: [0.85164138 0.96291111 0.9258201 0.96225045 0.9258201 0.96225045 0.84866842 0.88527041 0.89056356 0.96225045] mean value: 0.9177446427850156 key: train_mcc value: [0.98721586 0.98721563 0.9873145 0.97873227 0.9957537 0.9873145 0.98724298 0.99152527 0.97478586 0.98297872] mean value: 0.9860079271688947 key: test_accuracy value: [0.9245283 0.98113208 0.96153846 0.98076923 0.96153846 0.98076923 0.92307692 0.94230769 0.94230769 0.98076923] mean value: 0.9578737300435414 key: train_accuracy value: [0.99360341 0.99360341 0.99361702 0.9893617 0.99787234 0.99361702 0.99361702 0.99574468 0.98723404 0.99148936] mean value: 0.9929760014517081 key: test_fscore value: [0.92592593 0.98181818 0.96 0.98113208 0.96296296 0.98039216 0.92 0.94339623 0.94545455 0.98113208] mean value: 0.9582214150382852 key: train_fscore value: [0.99360341 0.99357602 0.99357602 0.98933902 0.9978678 0.99357602 0.99363057 0.9957265 0.98739496 0.99148936] mean value: 0.9929779674593665 key: test_precision value: [0.89285714 0.96428571 1. 0.96296296 0.92857143 1. 0.95833333 0.92592593 0.89655172 0.96296296] mean value: 0.9492451195037402 key: train_precision value: [0.9957265 0.99570815 1. 0.99145299 1. 1. 0.99152542 1. 0.97510373 0.99148936] mean value: 0.99410061615567 key: test_recall value: [0.96153846 1. 0.92307692 1. 1. 0.96153846 0.88461538 0.96153846 1. 1. ] mean value: 0.9692307692307692 key: train_recall value: [0.99148936 0.99145299 0.98723404 0.98723404 0.99574468 0.98723404 0.99574468 0.99148936 1. 0.99148936] mean value: 0.9919112565921077 key: test_roc_auc value: [0.92521368 0.98076923 0.96153846 0.98076923 0.96153846 0.98076923 0.92307692 0.94230769 0.94230769 0.98076923] mean value: 0.957905982905983 key: train_roc_auc value: [0.99360793 0.99359884 0.99361702 0.9893617 0.99787234 0.99361702 0.99361702 0.99574468 0.98723404 0.99148936] mean value: 0.9929759956355702 key: test_jcc value: [0.86206897 0.96428571 0.92307692 0.96296296 0.92857143 0.96153846 0.85185185 0.89285714 0.89655172 0.96296296] mean value: 0.9206728137762621 key: train_jcc value: [0.98728814 0.98723404 0.98723404 0.97890295 0.99574468 0.98723404 0.98734177 0.99148936 0.97510373 0.98312236] mean value: 0.9860695128853415 MCC on Blind test: 0.9 Accuracy on Blind test: 0.95 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.16664958 0.13899064 0.1713109 0.12175751 0.12432766 0.15624762 0.1744194 0.12778425 0.15947342 0.16238427] mean value: 0.1503345251083374 key: score_time value: [0.02422047 0.01502085 0.02464485 0.02465963 0.01550269 0.01529956 0.02474427 0.02651215 0.02499962 0.02835417] mean value: 0.022395825386047362 key: test_mcc value: [0.6980057 0.54700855 0.50336201 0.77151675 0.63245553 0.65433031 0.76923077 0.81312325 0.73131034 0.50037023] mean value: 0.662071343296795 key: train_mcc value: [0.98728791 0.99150708 0.9873145 0.9873145 0.9873145 0.99152527 0.9873145 0.9873145 0.9873145 0.9873145 ] mean value: 0.9881521740698564 key: test_accuracy value: [0.8490566 0.77358491 0.75 0.88461538 0.80769231 0.82692308 0.88461538 0.90384615 0.86538462 0.75 ] mean value: 0.8295718432510886 key: train_accuracy value: [0.99360341 0.99573561 0.99361702 0.99361702 0.99361702 0.99574468 0.99361702 0.99361702 0.99361702 0.99361702] mean value: 0.9940402848977 key: test_fscore value: [0.84615385 0.77777778 0.73469388 0.88 0.82758621 0.82352941 0.88461538 0.89795918 0.86792453 0.74509804] mean value: 0.8285338255950329 key: train_fscore value: [0.99357602 0.99570815 0.99357602 0.99357602 0.99357602 0.9957265 0.99357602 0.99357602 0.99357602 0.99357602] mean value: 0.9940042787277901 key: test_precision value: [0.84615385 0.77777778 0.7826087 0.91666667 0.75 0.84 0.88461538 0.95652174 0.85185185 0.76 ] mean value: 0.8366195961848135 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.84615385 0.77777778 0.69230769 0.84615385 0.92307692 0.80769231 0.88461538 0.84615385 0.88461538 0.73076923] mean value: 0.8239316239316239 key: train_recall value: [0.98723404 0.99145299 0.98723404 0.98723404 0.98723404 0.99148936 0.98723404 0.98723404 0.98723404 0.98723404] mean value: 0.9880814693580651 key: test_roc_auc value: [0.84900285 0.77350427 0.75 0.88461538 0.80769231 0.82692308 0.88461538 0.90384615 0.86538462 0.75 ] mean value: 0.8295584045584046 key: train_roc_auc value: [0.99361702 0.9957265 0.99361702 0.99361702 0.99361702 0.99574468 0.99361702 0.99361702 0.99361702 0.99361702] mean value: 0.9940407346790325 key: test_jcc value: [0.73333333 0.63636364 0.58064516 0.78571429 0.70588235 0.7 0.79310345 0.81481481 0.76666667 0.59375 ] mean value: 0.7110273699400098 key: train_jcc value: [0.98723404 0.99145299 0.98723404 0.98723404 0.98723404 0.99148936 0.98723404 0.98723404 0.98723404 0.98723404] mean value: 0.9880814693580651 MCC on Blind test: 0.58 Accuracy on Blind test: 0.79 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.7296083 0.72827125 0.7328434 0.73026657 0.72733808 0.73085546 0.72294044 0.72570992 0.74692559 0.73820591] mean value: 0.7312964916229248 key: score_time value: [0.00976491 0.00955319 0.00944519 0.00955868 0.00969505 0.00937343 0.00935888 0.00958729 0.01031232 0.00962877] mean value: 0.009627771377563477 key: test_mcc value: [0.85164138 0.92704716 0.96225045 0.96225045 0.9258201 0.96225045 0.88527041 0.92307692 0.9258201 0.96225045] mean value: 0.9287677874797963 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9245283 0.96226415 0.98076923 0.98076923 0.96153846 0.98076923 0.94230769 0.96153846 0.96153846 0.98076923] mean value: 0.9636792452830188 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92592593 0.96428571 0.98039216 0.98113208 0.96296296 0.98039216 0.94117647 0.96153846 0.96296296 0.98113208] mean value: 0.964190096293315 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.89285714 0.93103448 1. 0.96296296 0.92857143 1. 0.96 0.96153846 0.92857143 0.96296296] mean value: 0.9528498870223008 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96153846 1. 0.96153846 1. 1. 0.96153846 0.92307692 0.96153846 1. 1. ] mean value: 0.9769230769230769 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.92521368 0.96153846 0.98076923 0.98076923 0.96153846 0.98076923 0.94230769 0.96153846 0.96153846 0.98076923] mean value: 0.9636752136752137 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.86206897 0.93103448 0.96153846 0.96296296 0.92857143 0.96153846 0.88888889 0.92592593 0.92857143 0.96296296] mean value: 0.9314063969236382 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.86 Accuracy on Blind test: 0.93 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03424716 0.03178167 0.03212857 0.05195475 0.05340314 0.04832959 0.04696679 0.03173852 0.0314827 0.03206182] mean value: 0.0394094705581665 key: score_time value: [0.01278758 0.0395515 0.03736591 0.0196743 0.02260709 0.01882935 0.01456594 0.01464891 0.01470041 0.02665329] mean value: 0.022138428688049317 key: test_mcc value: [0.54700855 0.25905207 0.62279916 0.34684399 0.33333333 0.53846154 0.51916999 0.18257419 0.63245553 0.31622777] mean value: 0.4297926104204046 key: train_mcc value: [0.95006652 0.73515544 0.88334763 0.5920935 0.83105203 0.97880317 0.93009643 0.6846532 0.96191988 0.85319469] mean value: 0.840038248069776 key: test_accuracy value: [0.77358491 0.62264151 0.80769231 0.65384615 0.65384615 0.76923077 0.75 0.57692308 0.80769231 0.65384615] mean value: 0.7069303338171262 key: train_accuracy value: [0.97441365 0.85074627 0.93829787 0.75957447 0.90851064 0.9893617 0.96382979 0.81914894 0.98085106 0.9212766 ] mean value: 0.9106010978541941 key: test_fscore value: [0.76923077 0.6875 0.82142857 0.71875 0.70967742 0.76923077 0.77966102 0.66666667 0.82758621 0.68965517] mean value: 0.7439386592171113 key: train_fscore value: [0.97510373 0.86988848 0.94188377 0.80617496 0.91617934 0.98929336 0.9650924 0.84684685 0.98105263 0.9270217 ] mean value: 0.9218537211188351 key: test_precision value: [0.76923077 0.59459459 0.76666667 0.60526316 0.61111111 0.76923077 0.6969697 0.55 0.75 0.625 ] mean value: 0.6738066765698345 key: train_precision value: [0.951417 0.76973684 0.89015152 0.67528736 0.84532374 0.99568966 0.93253968 0.734375 0.97083333 0.86397059] mean value: 0.8629324717915119 key: test_recall value: [0.76923077 0.81481481 0.88461538 0.88461538 0.84615385 0.76923077 0.88461538 0.84615385 0.92307692 0.76923077] mean value: 0.8391737891737892 key: train_recall value: [1. 1. 1. 1. 1. 0.98297872 1. 1. 0.99148936 1. ] mean value: 0.9974468085106383 key: test_roc_auc value: [0.77350427 0.61894587 0.80769231 0.65384615 0.65384615 0.76923077 0.75 0.57692308 0.80769231 0.65384615] mean value: 0.7065527065527065 key: train_roc_auc value: [0.97435897 0.85106383 0.93829787 0.75957447 0.90851064 0.9893617 0.96382979 0.81914894 0.98085106 0.9212766 ] mean value: 0.9106273867975996 key: test_jcc value: [0.625 0.52380952 0.6969697 0.56097561 0.55 0.625 0.63888889 0.5 0.70588235 0.52631579] mean value: 0.5952841861839068 key: train_jcc value: [0.951417 0.76973684 0.89015152 0.67528736 0.84532374 0.97881356 0.93253968 0.734375 0.96280992 0.86397059] mean value: 0.8604425206086777 MCC on Blind test: 0.41 Accuracy on Blind test: 0.67 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02965641 0.03908777 0.03907251 0.03891897 0.03882527 0.03877044 0.03892875 0.03883457 0.03862453 0.0391221 ] mean value: 0.03798413276672363 key: score_time value: [0.01900005 0.01908803 0.01899123 0.01887655 0.01898837 0.01889229 0.01906919 0.0189023 0.0188899 0.01907778] mean value: 0.01897757053375244 key: test_mcc value: [0.85164138 0.73997003 0.77849894 0.92307692 0.89056356 0.74466871 0.73131034 0.88527041 0.81312325 0.77849894] mean value: 0.8136622485831022 key: train_mcc value: [0.85528213 0.86366944 0.85168866 0.85544308 0.85535013 0.85581519 0.8769849 0.86847048 0.86411148 0.85107154] mean value: 0.8597887013936425 key: test_accuracy value: [0.9245283 0.86792453 0.88461538 0.96153846 0.94230769 0.86538462 0.86538462 0.94230769 0.90384615 0.88461538] mean value: 0.904245283018868 key: train_accuracy value: [0.92750533 0.93176972 0.92553191 0.92765957 0.92765957 0.92765957 0.93829787 0.93404255 0.93191489 0.92553191] mean value: 0.929757292564533 key: test_fscore value: [0.92592593 0.87719298 0.875 0.96153846 0.94545455 0.85106383 0.86792453 0.94117647 0.90909091 0.89285714] mean value: 0.9047224796000481 key: train_fscore value: [0.92857143 0.93220339 0.92693111 0.92827004 0.9279661 0.92887029 0.93920335 0.93501048 0.93277311 0.92569002] mean value: 0.9305489328602898 key: test_precision value: [0.89285714 0.83333333 0.95454545 0.96153846 0.89655172 0.95238095 0.85185185 0.96 0.86206897 0.83333333] mean value: 0.8998461219495703 key: train_precision value: [0.91701245 0.92436975 0.90983607 0.92050209 0.92405063 0.91358025 0.92561983 0.9214876 0.92116183 0.92372881] mean value: 0.9201349310782884 key: test_recall value: [0.96153846 0.92592593 0.80769231 0.96153846 1. 0.76923077 0.88461538 0.92307692 0.96153846 0.96153846] mean value: 0.9156695156695157 key: train_recall value: [0.94042553 0.94017094 0.94468085 0.93617021 0.93191489 0.94468085 0.95319149 0.94893617 0.94468085 0.92765957] mean value: 0.9412511365702856 key: test_roc_auc value: [0.92521368 0.86680912 0.88461538 0.96153846 0.94230769 0.86538462 0.86538462 0.94230769 0.90384615 0.88461538] mean value: 0.9042022792022792 key: train_roc_auc value: [0.92747772 0.9317876 0.92553191 0.92765957 0.92765957 0.92765957 0.93829787 0.93404255 0.93191489 0.92553191] mean value: 0.9297563193307874 key: test_jcc value: [0.86206897 0.78125 0.77777778 0.92592593 0.89655172 0.74074074 0.76666667 0.88888889 0.83333333 0.80645161] mean value: 0.8279655635891732 key: train_jcc value: [0.86666667 0.87301587 0.86381323 0.86614173 0.86561265 0.8671875 0.88537549 0.87795276 0.87401575 0.86166008] mean value: 0.870144172681887 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.27485728 0.28193116 0.35548592 0.28141141 0.28373504 0.31686974 0.28387547 0.28604722 0.28356791 0.33537459] mean value: 0.29831557273864745 key: score_time value: [0.01901937 0.01924253 0.01896763 0.01907969 0.01897073 0.01907015 0.01907754 0.01901555 0.01911354 0.01900244] mean value: 0.019055914878845216 key: test_mcc value: [0.85164138 0.62867836 0.77849894 0.92307692 0.89056356 0.74466871 0.73131034 0.88527041 0.81312325 0.77849894] mean value: 0.8025330823545165 key: train_mcc value: [0.85528213 0.80817284 0.80498447 0.85544308 0.85535013 0.85581519 0.8769849 0.86847048 0.86411148 0.85107154] mean value: 0.8495686235274289 key: test_accuracy value: [0.9245283 0.81132075 0.88461538 0.96153846 0.94230769 0.86538462 0.86538462 0.94230769 0.90384615 0.88461538] mean value: 0.8985849056603774 key: train_accuracy value: [0.92750533 0.90405117 0.90212766 0.92765957 0.92765957 0.92765957 0.93829787 0.93404255 0.93191489 0.92553191] mean value: 0.924645012021957 key: test_fscore value: [0.92592593 0.82758621 0.875 0.96153846 0.94545455 0.85106383 0.86792453 0.94117647 0.90909091 0.89285714] mean value: 0.8997618020440893 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:148: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:151: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.92857143 0.9044586 0.90416667 0.92827004 0.9279661 0.92887029 0.93920335 0.93501048 0.93277311 0.92569002] mean value: 0.9254980097693355 key: test_precision value: [0.89285714 0.77419355 0.95454545 0.96153846 0.89655172 0.95238095 0.85185185 0.96 0.86206897 0.83333333] mean value: 0.8939321434549465 key: train_precision value: [0.91701245 0.89873418 0.88571429 0.92050209 0.92405063 0.91358025 0.92561983 0.9214876 0.92116183 0.92372881] mean value: 0.9151591960239429 key: test_recall value: [0.96153846 0.88888889 0.80769231 0.96153846 1. 0.76923077 0.88461538 0.92307692 0.96153846 0.96153846] mean value: 0.911965811965812 key: train_recall value: [0.94042553 0.91025641 0.92340426 0.93617021 0.93191489 0.94468085 0.95319149 0.94893617 0.94468085 0.92765957] mean value: 0.9361320240043645 key: test_roc_auc value: [0.92521368 0.80982906 0.88461538 0.96153846 0.94230769 0.86538462 0.86538462 0.94230769 0.90384615 0.88461538] mean value: 0.8985042735042735 key: train_roc_auc value: [0.92747772 0.90406438 0.90212766 0.92765957 0.92765957 0.92765957 0.93829787 0.93404255 0.93191489 0.92553191] mean value: 0.9246435715584652 key: test_jcc value: [0.86206897 0.70588235 0.77777778 0.92592593 0.89655172 0.74074074 0.76666667 0.88888889 0.83333333 0.80645161] mean value: 0.8204287988832908 key: train_jcc value: [0.86666667 0.8255814 0.82509506 0.86614173 0.86561265 0.8671875 0.88537549 0.87795276 0.87401575 0.86166008] mean value: 0.861528907661407 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03136301 0.03624725 0.03802299 0.0339272 0.03786993 0.03451753 0.03579712 0.0273068 0.03700662 0.03485203] mean value: 0.03469104766845703 key: score_time value: [0.01213574 0.01400828 0.01402617 0.01210189 0.01214027 0.01211405 0.0120852 0.01208687 0.01415348 0.01223993] mean value: 0.01270918846130371 key: test_mcc value: [0.8459178 0.92427578 0.85407434 0.84544958 0.80461538 0.76662339 0.80431528 0.88289781 0.72057669 0.68 ] mean value: 0.8128746057870775 key: train_mcc value: [0.85120279 0.85123255 0.86870834 0.86453248 0.85558875 0.86874413 0.85565707 0.8690155 0.86912823 0.87776273] mean value: 0.863157258117954 key: test_accuracy value: [0.92156863 0.96078431 0.92156863 0.92156863 0.90196078 0.88235294 0.90196078 0.94117647 0.86 0.84 ] mean value: 0.9052941176470588 key: train_accuracy value: [0.92560175 0.92560175 0.93435449 0.9321663 0.92778993 0.93435449 0.92778993 0.93435449 0.93449782 0.93886463] mean value: 0.9315375574517691 key: test_fscore value: [0.92307692 0.95833333 0.92592593 0.91666667 0.90196078 0.88888889 0.90566038 0.94339623 0.85714286 0.84 ] mean value: 0.9061051983121906 key: train_fscore value: [0.92576419 0.92608696 0.93449782 0.93304536 0.92778993 0.93449782 0.92810458 0.93506494 0.93506494 0.93913043] mean value: 0.9319046952651103 key: test_precision value: [0.88888889 1. 0.86206897 0.95652174 0.92 0.85714286 0.88888889 0.92592593 0.875 0.84 ] mean value: 0.9014437265494237 key: train_precision value: [0.92576419 0.92207792 0.93449782 0.92307692 0.92576419 0.93043478 0.92207792 0.92307692 0.92703863 0.93506494] mean value: 0.9268874235466126 key: test_recall value: [0.96 0.92 1. 0.88 0.88461538 0.92307692 0.92307692 0.96153846 0.84 0.84 ] mean value: 0.9132307692307693 key: train_recall value: [0.92576419 0.930131 0.93449782 0.94323144 0.92982456 0.93859649 0.93421053 0.94736842 0.94323144 0.94323144] mean value: 0.9370087336244541 key: test_roc_auc value: [0.92230769 0.96 0.92307692 0.92076923 0.90230769 0.88153846 0.90153846 0.94076923 0.86 0.84 ] mean value: 0.9052307692307693 key: train_roc_auc value: [0.92560139 0.92559182 0.93435417 0.93214204 0.92779438 0.93436375 0.92780395 0.9343829 0.93449782 0.93886463] mean value: 0.9315396843637478 key: test_jcc value: [0.85714286 0.92 0.86206897 0.84615385 0.82142857 0.8 0.82758621 0.89285714 0.75 0.72413793] mean value: 0.8301375521030694 key: train_jcc value: [0.86178862 0.86234818 0.87704918 0.87449393 0.86530612 0.87704918 0.86585366 0.87804878 0.87804878 0.8852459 ] mean value: 0.8725232327405593 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [1.08443904 0.96518111 0.84693241 0.97842479 0.87160254 0.96375942 0.91595888 0.93315125 0.96591616 0.96043491] mean value: 0.9485800504684448 key: score_time value: [0.01489997 0.01467752 0.01478934 0.02118349 0.01491809 0.01476455 0.01487756 0.01552033 0.01489472 0.01519156] mean value: 0.0155717134475708 key: test_mcc value: [0.8459178 0.88289781 0.88872671 0.80904133 0.88307692 0.76461538 0.80431528 0.92153846 0.76 0.6821865 ] mean value: 0.8242316201242978 key: train_mcc value: [0.90375223 0.89066391 0.91247223 0.95627191 0.88184708 0.90375591 0.89059986 0.89956325 0.90406806 0.96510231] mean value: 0.9108096761378437 key: test_accuracy value: [0.92156863 0.94117647 0.94117647 0.90196078 0.94117647 0.88235294 0.90196078 0.96078431 0.88 0.84 ] mean value: 0.9112156862745098 key: train_accuracy value: [0.95185996 0.9452954 0.95623632 0.97811816 0.94091904 0.95185996 0.9452954 0.94967177 0.95196507 0.98253275] mean value: 0.9553753834099357 key: test_fscore value: [0.92307692 0.93877551 0.94339623 0.89361702 0.94117647 0.88461538 0.90566038 0.96153846 0.88 0.83333333] mean value: 0.91051897084066 key: train_fscore value: [0.95217391 0.94577007 0.95633188 0.97826087 0.94091904 0.95196507 0.9452954 0.95010846 0.95238095 0.98245614] mean value: 0.9555661785530866 key: test_precision value: [0.88888889 0.95833333 0.89285714 0.95454545 0.96 0.88461538 0.88888889 0.96153846 0.88 0.86956522] mean value: 0.9139232772058858 key: train_precision value: [0.94805195 0.93965517 0.95633188 0.97402597 0.93886463 0.94782609 0.94323144 0.93991416 0.94420601 0.98678414] mean value: 0.9518891441689473 key: test_recall value: [0.96 0.92 1. 0.84 0.92307692 0.88461538 0.92307692 0.96153846 0.88 0.8 ] mean value: 0.9092307692307693 key: train_recall value: [0.95633188 0.95196507 0.95633188 0.98253275 0.94298246 0.95614035 0.94736842 0.96052632 0.96069869 0.97816594] mean value: 0.9593043744733012 key: test_roc_auc value: [0.92230769 0.94076923 0.94230769 0.90076923 0.94153846 0.88230769 0.90153846 0.96076923 0.88 0.84 ] mean value: 0.9112307692307692 key: train_roc_auc value: [0.95185015 0.94528078 0.95623611 0.97810848 0.94092354 0.9518693 0.94529993 0.94969547 0.95196507 0.98253275] mean value: 0.955376158737455 key: test_jcc value: [0.85714286 0.88461538 0.89285714 0.80769231 0.88888889 0.79310345 0.82758621 0.92592593 0.78571429 0.71428571] mean value: 0.8377812162294921 key: train_jcc value: [0.90871369 0.89711934 0.91631799 0.95744681 0.88842975 0.90833333 0.89626556 0.90495868 0.90909091 0.96551724] mean value: 0.9152193308373876 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01459789 0.01174212 0.01132417 0.01122999 0.0112493 0.00993657 0.00992107 0.01049495 0.00995612 0.01000881] mean value: 0.011046099662780761 key: score_time value: [0.01251364 0.01037288 0.00981545 0.0099647 0.0097928 0.00900888 0.00895166 0.00888586 0.0090785 0.00888777] mean value: 0.009727215766906739 key: test_mcc value: [0.7531751 0.6938347 0.64715023 0.64769231 0.70728397 0.60769231 0.77353193 0.5372904 0.68 0.61806423] mean value: 0.6665715181657266 key: train_mcc value: [0.7051679 0.68663317 0.70105568 0.71111913 0.69491764 0.71441791 0.68598516 0.69626736 0.71979689 0.71353415] mean value: 0.7028894982851523 key: test_accuracy value: [0.8627451 0.84313725 0.82352941 0.82352941 0.84313725 0.80392157 0.88235294 0.76470588 0.84 0.8 ] mean value: 0.8287058823529412 key: train_accuracy value: [0.8512035 0.84245077 0.84901532 0.85339168 0.84682713 0.85557987 0.84026258 0.84682713 0.8580786 0.8558952 ] mean value: 0.8499531785997535 key: test_fscore value: [0.8372093 0.82608696 0.81632653 0.82352941 0.82608696 0.80769231 0.89285714 0.75 0.84 0.77272727] mean value: 0.8192515881022734 key: train_fscore value: [0.84474886 0.83710407 0.84210526 0.84526559 0.84162896 0.84792627 0.82903981 0.83944954 0.85057471 0.85067873] mean value: 0.8428521809081373 key: test_precision value: [1. 0.9047619 0.83333333 0.80769231 0.95 0.80769231 0.83333333 0.81818182 0.84 0.89473684] mean value: 0.8689731847100268 key: train_precision value: [0.88516746 0.8685446 0.88461538 0.89705882 0.86915888 0.89320388 0.88944724 0.87980769 0.89805825 0.88262911] mean value: 0.8847691324095417 key: test_recall value: [0.72 0.76 0.8 0.84 0.73076923 0.80769231 0.96153846 0.69230769 0.84 0.68 ] mean value: 0.7832307692307692 key: train_recall value: [0.80786026 0.80786026 0.80349345 0.79912664 0.81578947 0.80701754 0.77631579 0.80263158 0.80786026 0.8209607 ] mean value: 0.8048915958017314 key: test_roc_auc value: [0.86 0.84153846 0.82307692 0.82384615 0.84538462 0.80384615 0.88076923 0.76615385 0.84 0.8 ] mean value: 0.8284615384615385 key: train_roc_auc value: [0.85129855 0.84252662 0.84911515 0.85351069 0.84675937 0.85547384 0.84012296 0.84673064 0.8580786 0.8558952 ] mean value: 0.8499511606527235 key: test_jcc value: [0.72 0.7037037 0.68965517 0.7 0.7037037 0.67741935 0.80645161 0.6 0.72413793 0.62962963] mean value: 0.6954701108227248 key: train_jcc value: [0.7312253 0.71984436 0.72727273 0.732 0.7265625 0.736 0.708 0.72332016 0.74 0.74015748] mean value: 0.7284382520109796 MCC on Blind test: 0.63 Accuracy on Blind test: 0.81 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01254296 0.01074314 0.0115149 0.01157999 0.01032472 0.01010489 0.01028371 0.01140451 0.01022911 0.01108527] mean value: 0.010981321334838867 key: score_time value: [0.00986099 0.00936627 0.00975776 0.00993228 0.00894356 0.0091629 0.00905657 0.00938177 0.00915146 0.01000309] mean value: 0.009461665153503418 key: test_mcc value: [0.88289781 0.62355907 0.77487835 0.68875274 0.72615385 0.72573276 0.68779719 0.61017022 0.60783067 0.72524067] mean value: 0.705301332855258 key: train_mcc value: [0.75071367 0.74619319 0.75930821 0.77253746 0.74212413 0.74619319 0.72014338 0.75504732 0.77747792 0.76862491] mean value: 0.7538363372752251 key: test_accuracy value: [0.94117647 0.80392157 0.88235294 0.84313725 0.8627451 0.8627451 0.84313725 0.80392157 0.8 0.86 ] mean value: 0.8503137254901961 key: train_accuracy value: [0.87527352 0.87308534 0.87964989 0.88621444 0.87089716 0.87308534 0.85995624 0.87746171 0.88864629 0.88427948] mean value: 0.876854939657726 key: test_fscore value: [0.93877551 0.77272727 0.88888889 0.84615385 0.8627451 0.86792453 0.85185185 0.8 0.81481481 0.85106383] mean value: 0.8494945640769093 key: train_fscore value: [0.87688985 0.87391304 0.87964989 0.88744589 0.86859688 0.8722467 0.85777778 0.87826087 0.88984881 0.88503254] mean value: 0.8769662245721188 key: test_precision value: [0.95833333 0.89473684 0.82758621 0.81481481 0.88 0.85185185 0.82142857 0.83333333 0.75862069 0.90909091] mean value: 0.8549796552509801 key: train_precision value: [0.86752137 0.87012987 0.88157895 0.87982833 0.88235294 0.87610619 0.86936937 0.87068966 0.88034188 0.87931034] mean value: 0.8757228896777902 key: test_recall value: [0.92 0.68 0.96 0.88 0.84615385 0.88461538 0.88461538 0.76923077 0.88 0.8 ] mean value: 0.8504615384615385 key: train_recall value: [0.88646288 0.87772926 0.87772926 0.89519651 0.85526316 0.86842105 0.84649123 0.88596491 0.89956332 0.89082969] mean value: 0.878365126790776 key: test_roc_auc value: [0.94076923 0.80153846 0.88384615 0.84384615 0.86307692 0.86230769 0.84230769 0.80461538 0.8 0.86 ] mean value: 0.8502307692307692 key: train_roc_auc value: [0.87524898 0.87307516 0.8796541 0.88619474 0.87086302 0.87307516 0.85992684 0.87748027 0.88864629 0.88427948] mean value: 0.8768444035853827 key: test_jcc value: [0.88461538 0.62962963 0.8 0.73333333 0.75862069 0.76666667 0.74193548 0.66666667 0.6875 0.74074074] mean value: 0.7409708595178561 key: train_jcc value: [0.78076923 0.77606178 0.78515625 0.79766537 0.76771654 0.7734375 0.75097276 0.78294574 0.80155642 0.79377432] mean value: 0.7810055900293517 MCC on Blind test: 0.76 Accuracy on Blind test: 0.88 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.0107193 0.01060081 0.01052904 0.01056099 0.01059866 0.01077914 0.0097878 0.00977492 0.00972342 0.00962353] mean value: 0.010269761085510254 key: score_time value: [0.01285338 0.01341796 0.01517129 0.01273727 0.01308703 0.0128603 0.01233506 0.01229858 0.01202655 0.01252079] mean value: 0.012930822372436524 key: test_mcc value: [0.72984534 0.2668549 0.61017022 0.49076923 0.48998517 0.68875274 0.61017022 0.41140265 0.6 0.5 ] mean value: 0.539795046786689 key: train_mcc value: [0.69010909 0.70240558 0.68928004 0.69803298 0.72889968 0.71606598 0.68082181 0.72482631 0.70759226 0.72995395] mean value: 0.7067987676078814 key: test_accuracy value: [0.8627451 0.62745098 0.80392157 0.74509804 0.74509804 0.84313725 0.80392157 0.70588235 0.8 0.74 ] mean value: 0.7677254901960785 key: train_accuracy value: [0.84463895 0.8512035 0.84463895 0.84901532 0.8643326 0.85776805 0.84026258 0.86214442 0.85371179 0.86462882] mean value: 0.8532344987721326 key: test_fscore value: [0.85106383 0.53658537 0.80769231 0.74509804 0.75471698 0.84 0.8 0.71698113 0.8 0.69767442] mean value: 0.7549812074361085 key: train_fscore value: [0.84116331 0.85152838 0.8453159 0.8496732 0.86222222 0.85458613 0.83741648 0.8590604 0.85209713 0.86160714] mean value: 0.8514670310824969 key: test_precision value: [0.90909091 0.6875 0.77777778 0.73076923 0.74074074 0.875 0.83333333 0.7037037 0.8 0.83333333] mean value: 0.7891249028749029 key: train_precision value: [0.86238532 0.85152838 0.84347826 0.84782609 0.87387387 0.87214612 0.85067873 0.87671233 0.86160714 0.88127854] mean value: 0.8621514789270541 key: test_recall value: [0.8 0.44 0.84 0.76 0.76923077 0.80769231 0.76923077 0.73076923 0.8 0.6 ] mean value: 0.7316923076923078 key: train_recall value: [0.8209607 0.85152838 0.84716157 0.85152838 0.85087719 0.8377193 0.8245614 0.84210526 0.84279476 0.84279476] mean value: 0.8412031716846702 key: test_roc_auc value: [0.86153846 0.62384615 0.80461538 0.74538462 0.74461538 0.84384615 0.80461538 0.70538462 0.8 0.74 ] mean value: 0.7673846153846153 key: train_roc_auc value: [0.84469088 0.85120279 0.84463342 0.84900981 0.86430323 0.85772428 0.8402283 0.86210067 0.85371179 0.86462882] mean value: 0.8532233969202482 key: test_jcc value: [0.74074074 0.36666667 0.67741935 0.59375 0.60606061 0.72413793 0.66666667 0.55882353 0.66666667 0.53571429] mean value: 0.613664644780059 key: train_jcc value: [0.72586873 0.74144487 0.73207547 0.73863636 0.7578125 0.74609375 0.72030651 0.75294118 0.74230769 0.75686275] mean value: 0.7414349805409636 MCC on Blind test: 0.38 Accuracy on Blind test: 0.69 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02395844 0.01976371 0.02095556 0.02216721 0.02027059 0.0211978 0.01987791 0.02031898 0.02200055 0.01978111] mean value: 0.021029186248779298 key: score_time value: [0.01175141 0.0115068 0.01230741 0.01178312 0.01150537 0.01177835 0.01278973 0.01174521 0.0115788 0.0126636 ] mean value: 0.011940979957580566 key: test_mcc value: [0.8459178 0.88823731 0.85407434 0.80431528 0.80461538 0.80431528 0.80461538 0.80431528 0.76 0.64051262] mean value: 0.8010918682007854 key: train_mcc value: [0.79431931 0.7943723 0.79881623 0.80306832 0.80307209 0.80746615 0.80307209 0.80306832 0.81223482 0.81662503] mean value: 0.8036114663208501 key: test_accuracy value: [0.92156863 0.94117647 0.92156863 0.90196078 0.90196078 0.90196078 0.90196078 0.90196078 0.88 0.82 ] mean value: 0.8994117647058824 key: train_accuracy value: [0.89715536 0.89715536 0.89934354 0.90153173 0.90153173 0.90371991 0.90153173 0.90153173 0.90611354 0.90829694] mean value: 0.9017911574441249 key: test_fscore value: [0.92307692 0.93617021 0.92592593 0.89795918 0.90196078 0.90566038 0.90196078 0.90566038 0.88 0.81632653] mean value: 0.8994701099398953 key: train_fscore value: [0.89715536 0.89804772 0.89867841 0.90196078 0.90153173 0.9030837 0.90153173 0.9010989 0.90631808 0.90869565] mean value: 0.9018102075636133 key: test_precision value: [0.88888889 1. 0.86206897 0.91666667 0.92 0.88888889 0.92 0.88888889 0.88 0.83333333] mean value: 0.8998735632183907 key: train_precision value: [0.89912281 0.89224138 0.90666667 0.9 0.89956332 0.90707965 0.89956332 0.9030837 0.90434783 0.9047619 ] mean value: 0.901643056785623 key: test_recall value: [0.96 0.88 1. 0.88 0.88461538 0.92307692 0.88461538 0.92307692 0.88 0.8 ] mean value: 0.9015384615384615 key: train_recall value: [0.89519651 0.90393013 0.89082969 0.90393013 0.90350877 0.89912281 0.90350877 0.89912281 0.90829694 0.91266376] mean value: 0.902011031946679 key: test_roc_auc value: [0.92230769 0.94 0.92307692 0.90153846 0.90230769 0.90153846 0.90230769 0.90153846 0.88 0.82 ] mean value: 0.8994615384615384 key: train_roc_auc value: [0.89715966 0.8971405 0.89936222 0.90152647 0.90153605 0.90370988 0.90153605 0.90152647 0.90611354 0.90829694] mean value: 0.9017907760668046 key: test_jcc value: [0.85714286 0.88 0.86206897 0.81481481 0.82142857 0.82758621 0.82142857 0.82758621 0.78571429 0.68965517] mean value: 0.8187425652253238 key: train_jcc value: [0.81349206 0.81496063 0.816 0.82142857 0.82071713 0.82329317 0.82071713 0.82 0.82868526 0.83266932] mean value: 0.8211963282154172 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.95274234 1.77280283 1.89924169 1.68007755 2.21786213 1.90000033 1.98987079 1.87301683 1.61986661 2.00075674] mean value: 1.8906237840652467 key: score_time value: [0.01277399 0.01594472 0.01440883 0.01712847 0.0163753 0.01307678 0.0148952 0.02330542 0.01241088 0.01312971] mean value: 0.015344929695129395 key: test_mcc value: [0.8459178 0.88289781 0.8459178 0.80904133 0.84307692 0.65224812 0.84307692 0.80431528 0.72057669 0.64051262] mean value: 0.7887581298524704 key: train_mcc value: [0.99124722 0.9956331 1. 0.98695627 0.99128503 0.99563319 0.98688041 0.99563319 0.96966347 1. ] mean value: 0.9912931889984316 key: test_accuracy value: [0.92156863 0.94117647 0.92156863 0.90196078 0.92156863 0.82352941 0.92156863 0.90196078 0.86 0.82 ] mean value: 0.8934901960784314 key: train_accuracy value: [0.99562363 0.99781182 1. 0.99343545 0.99562363 0.99781182 0.99343545 0.99781182 0.98471616 1. ] mean value: 0.9956269767708522 key: test_fscore value: [0.92307692 0.93877551 0.92307692 0.89361702 0.92307692 0.81632653 0.92307692 0.90566038 0.8627451 0.81632653] mean value: 0.8925758760410566 key: train_fscore value: [0.99563319 0.99782135 1. 0.99340659 0.99559471 0.99781182 0.99343545 0.99781182 0.98488121 1. ] mean value: 0.9956396136064475 key: test_precision value: [0.88888889 0.95833333 0.88888889 0.95454545 0.92307692 0.86956522 0.92307692 0.88888889 0.84615385 0.83333333] mean value: 0.8974751697577784 key: train_precision value: [0.99563319 0.99565217 1. 1. 1. 0.99563319 0.99126638 0.99563319 0.97435897 1. ] mean value: 0.9948177087136647 key: test_recall value: [0.96 0.92 0.96 0.84 0.92307692 0.76923077 0.92307692 0.92307692 0.88 0.8 ] mean value: 0.8898461538461538 key: train_recall value: [0.99563319 1. 1. 0.98689956 0.99122807 1. 0.99561404 1. 0.99563319 1. ] mean value: 0.9965008044127787 key: test_roc_auc value: [0.92230769 0.94076923 0.92230769 0.90076923 0.92153846 0.82461538 0.92153846 0.90153846 0.86 0.82 ] mean value: 0.8935384615384615 key: train_roc_auc value: [0.99562361 0.99780702 1. 0.99344978 0.99561404 0.99781659 0.99344021 0.99781659 0.98471616 1. ] mean value: 0.9956283996016241 key: test_jcc value: [0.85714286 0.88461538 0.85714286 0.80769231 0.85714286 0.68965517 0.85714286 0.82758621 0.75862069 0.68965517] mean value: 0.8086396362258431 key: train_jcc value: [0.99130435 0.99565217 1. 0.98689956 0.99122807 0.99563319 0.98695652 0.99563319 0.97021277 1. ] mean value: 0.9913519818475776 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.04281473 0.02258086 0.01959491 0.02242303 0.01986456 0.02302599 0.02203774 0.0218904 0.02118134 0.02031946] mean value: 0.02357330322265625 key: score_time value: [0.01040149 0.00911903 0.00908256 0.0089097 0.00903273 0.00893998 0.00948405 0.00905752 0.00999212 0.00940347] mean value: 0.009342265129089356 key: test_mcc value: [0.96153846 0.76662339 0.92450033 0.64715023 0.96148034 0.92153846 0.96148034 0.96153846 0.88070485 0.76 ] mean value: 0.8746554854753938 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98039216 0.88235294 0.96078431 0.82352941 0.98039216 0.96078431 0.98039216 0.98039216 0.94 0.88 ] mean value: 0.9369019607843136 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98039216 0.875 0.96153846 0.81632653 0.98113208 0.96153846 0.98113208 0.98039216 0.93877551 0.88 ] mean value: 0.9356227428562136 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96153846 0.91304348 0.92592593 0.83333333 0.96296296 0.96153846 0.96296296 1. 0.95833333 0.88 ] mean value: 0.9359638919856311 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.84 1. 0.8 1. 0.96153846 1. 0.96153846 0.92 0.88 ] mean value: 0.9363076923076923 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98076923 0.88153846 0.96153846 0.82307692 0.98 0.96076923 0.98 0.98076923 0.94 0.88 ] mean value: 0.9368461538461539 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96153846 0.77777778 0.92592593 0.68965517 0.96296296 0.92592593 0.96296296 0.96153846 0.88461538 0.78571429] mean value: 0.8838617321375942 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.82 Accuracy on Blind test: 0.9 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.12462068 0.12486506 0.12026739 0.12022614 0.12115216 0.11893344 0.11995935 0.11945629 0.11925888 0.11902761] mean value: 0.12077670097351074 key: score_time value: [0.01896501 0.01762033 0.01811218 0.01792049 0.01771355 0.01781607 0.01778412 0.01798749 0.01778889 0.01803756] mean value: 0.01797456741333008 key: test_mcc value: [0.92153846 0.78581168 0.82041265 0.88289781 0.76733527 0.68779719 0.80904133 0.73107432 0.6821865 0.72057669] mean value: 0.7808671912348781 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96078431 0.88235294 0.90196078 0.94117647 0.88235294 0.84313725 0.90196078 0.8627451 0.84 0.86 ] mean value: 0.8876470588235295 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96 0.86363636 0.90909091 0.93877551 0.88 0.85185185 0.90909091 0.85714286 0.84615385 0.8627451 ] mean value: 0.8878487345210034 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96 1. 0.83333333 0.95833333 0.91666667 0.82142857 0.86206897 0.91304348 0.81481481 0.84615385] mean value: 0.8925843009508676 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96 0.76 1. 0.92 0.84615385 0.88461538 0.96153846 0.80769231 0.88 0.88 ] mean value: 0.89 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96076923 0.88 0.90384615 0.94076923 0.88307692 0.84230769 0.90076923 0.86384615 0.84 0.86 ] mean value: 0.8875384615384615 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.92307692 0.76 0.83333333 0.88461538 0.78571429 0.74193548 0.83333333 0.75 0.73333333 0.75862069] mean value: 0.8003962766932734 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.81 Accuracy on Blind test: 0.9 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01009369 0.0099442 0.00995088 0.00998259 0.01005554 0.01011753 0.00999284 0.01003075 0.01016021 0.01010418] mean value: 0.01004323959350586 key: score_time value: [0.00866437 0.00865102 0.00874305 0.008708 0.00879359 0.00871396 0.00872636 0.00874496 0.00876403 0.00880837] mean value: 0.008731770515441894 key: test_mcc value: [0.72984534 0.53444024 0.61648638 0.41306141 0.61017022 0.30559708 0.5301448 0.29366622 0.52167203 0.24174689] mean value: 0.47968306142454376 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8627451 0.76470588 0.80392157 0.70588235 0.80392157 0.64705882 0.76470588 0.64705882 0.76 0.62 ] mean value: 0.738 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85106383 0.73913043 0.81481481 0.68085106 0.8 0.60869565 0.77777778 0.66666667 0.76923077 0.64150943] mean value: 0.7349740443025836 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.90909091 0.80952381 0.75862069 0.72727273 0.83333333 0.7 0.75 0.64285714 0.74074074 0.60714286] mean value: 0.7478582209616692 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.8 0.68 0.88 0.64 0.76923077 0.53846154 0.80769231 0.69230769 0.8 0.68 ] mean value: 0.7287692307692308 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86153846 0.76307692 0.80538462 0.70461538 0.80461538 0.64923077 0.76384615 0.64615385 0.76 0.62 ] mean value: 0.7378461538461538 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.74074074 0.5862069 0.6875 0.51612903 0.66666667 0.4375 0.63636364 0.5 0.625 0.47222222] mean value: 0.5868329194803055 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.58 Accuracy on Blind test: 0.79 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.84472108 1.87050414 1.77609229 1.76353192 1.76929617 1.75256467 1.76388931 1.7507031 1.74432325 1.72300816] mean value: 1.7758634090423584 key: score_time value: [0.10199165 0.10157084 0.09312439 0.09370327 0.0956409 0.09323788 0.1439209 0.09496427 0.09200835 0.09272313] mean value: 0.10028855800628662 key: test_mcc value: [0.96148034 0.88823731 0.88872671 0.84307692 1. 0.88289781 0.92427578 0.92450033 0.84 0.88070485] mean value: 0.9033900045709102 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98039216 0.94117647 0.94117647 0.92156863 1. 0.94117647 0.96078431 0.96078431 0.92 0.94 ] mean value: 0.9507058823529412 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.97959184 0.93617021 0.94339623 0.92 1. 0.94339623 0.96296296 0.96 0.92 0.93877551] mean value: 0.9504292975497886 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.89285714 0.92 1. 0.92592593 0.92857143 1. 0.92 0.95833333] mean value: 0.9545687830687831 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96 0.88 1. 0.92 1. 0.96153846 1. 0.92307692 0.92 0.92 ] mean value: 0.9484615384615385 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98 0.94 0.94230769 0.92153846 1. 0.94076923 0.96 0.96153846 0.92 0.94 ] mean value: 0.9506153846153846 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96 0.88 0.89285714 0.85185185 1. 0.89285714 0.92857143 0.92307692 0.85185185 0.88461538] mean value: 0.9065681725681726 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.81 Accuracy on Blind test: 0.9 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.96350503 0.96648908 0.9615953 0.95503259 0.93873477 0.94843411 0.95473123 0.98758698 0.94644499 0.94424415] mean value: 0.9566798210144043 key: score_time value: [0.16573906 0.28320813 0.27402806 0.27604485 0.13133049 0.31606078 0.21691132 0.13033724 0.27074957 0.25981092] mean value: 0.23242204189300536 key: test_mcc value: [0.96148034 0.96148034 0.88872671 0.84307692 0.96148034 0.88289781 0.88823731 0.92450033 0.84 0.88070485] mean value: 0.903258494769246 key: train_mcc value: [0.9518693 0.94748334 0.95194315 0.96062133 0.93873056 0.95186838 0.95186838 0.94751863 0.95633188 0.96070785] mean value: 0.9518942794629011 key: test_accuracy value: [0.98039216 0.98039216 0.94117647 0.92156863 0.98039216 0.94117647 0.94117647 0.96078431 0.92 0.94 ] mean value: 0.9507058823529412 key: train_accuracy value: [0.97592998 0.97374179 0.97592998 0.98030635 0.96936543 0.97592998 0.97592998 0.97374179 0.97816594 0.98034934] mean value: 0.975939055736577 key: test_fscore value: [0.97959184 0.97959184 0.94339623 0.92 0.98113208 0.94339623 0.94545455 0.96 0.92 0.93877551] mean value: 0.9511338257429902 key: train_fscore value: [0.97592998 0.97379913 0.97582418 0.98039216 0.96929825 0.97582418 0.97582418 0.97356828 0.97816594 0.98039216] mean value: 0.9759018412370725 key: test_precision value: [1. 1. 0.89285714 0.92 0.96296296 0.92592593 0.89655172 1. 0.92 0.95833333] mean value: 0.9476631089217297 key: train_precision value: [0.97807018 0.97379913 0.98230088 0.97826087 0.96929825 0.97797357 0.97797357 0.97787611 0.97816594 0.97826087] mean value: 0.9771979353399569 key: test_recall value: [0.96 0.96 1. 0.92 1. 0.96153846 1. 0.92307692 0.92 0.92 ] mean value: 0.9564615384615385 key: train_recall value: [0.97379913 0.97379913 0.96943231 0.98253275 0.96929825 0.97368421 0.97368421 0.96929825 0.97816594 0.98253275] mean value: 0.9746226921014326 key: test_roc_auc value: [0.98 0.98 0.94230769 0.92153846 0.98 0.94076923 0.94 0.96153846 0.92 0.94 ] mean value: 0.9506153846153846 key: train_roc_auc value: [0.97593465 0.97374167 0.97594423 0.98030146 0.96936528 0.97592507 0.97592507 0.97373209 0.97816594 0.98034934] mean value: 0.9759384815751169 key: test_jcc value: [0.96 0.96 0.89285714 0.85185185 0.96296296 0.89285714 0.89655172 0.92307692 0.85185185 0.88461538] mean value: 0.9076624984211191 key: train_jcc value: [0.95299145 0.94893617 0.9527897 0.96153846 0.94042553 0.9527897 0.9527897 0.94849785 0.95726496 0.96153846] mean value: 0.9529561988250692 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01163793 0.01159668 0.01061368 0.01021719 0.01039076 0.01039505 0.01055193 0.01151991 0.01037455 0.01054192] mean value: 0.010783958435058593 key: score_time value: [0.01002407 0.00981021 0.00904036 0.00957513 0.00927067 0.00931168 0.00938725 0.00996947 0.00909567 0.0091548 ] mean value: 0.009463930130004882 key: test_mcc value: [0.88289781 0.62355907 0.77487835 0.68875274 0.72615385 0.72573276 0.68779719 0.61017022 0.60783067 0.72524067] mean value: 0.705301332855258 key: train_mcc value: [0.75071367 0.74619319 0.75930821 0.77253746 0.74212413 0.74619319 0.72014338 0.75504732 0.77747792 0.76862491] mean value: 0.7538363372752251 key: test_accuracy value: [0.94117647 0.80392157 0.88235294 0.84313725 0.8627451 0.8627451 0.84313725 0.80392157 0.8 0.86 ] mean value: 0.8503137254901961 key: train_accuracy value: [0.87527352 0.87308534 0.87964989 0.88621444 0.87089716 0.87308534 0.85995624 0.87746171 0.88864629 0.88427948] mean value: 0.876854939657726 key: test_fscore value: [0.93877551 0.77272727 0.88888889 0.84615385 0.8627451 0.86792453 0.85185185 0.8 0.81481481 0.85106383] mean value: 0.8494945640769093 key: train_fscore value: [0.87688985 0.87391304 0.87964989 0.88744589 0.86859688 0.8722467 0.85777778 0.87826087 0.88984881 0.88503254] mean value: 0.8769662245721188 key: test_precision value: [0.95833333 0.89473684 0.82758621 0.81481481 0.88 0.85185185 0.82142857 0.83333333 0.75862069 0.90909091] mean value: 0.8549796552509801 key: train_precision value: [0.86752137 0.87012987 0.88157895 0.87982833 0.88235294 0.87610619 0.86936937 0.87068966 0.88034188 0.87931034] mean value: 0.8757228896777902 key: test_recall value: [0.92 0.68 0.96 0.88 0.84615385 0.88461538 0.88461538 0.76923077 0.88 0.8 ] mean value: 0.8504615384615385 key: train_recall value: [0.88646288 0.87772926 0.87772926 0.89519651 0.85526316 0.86842105 0.84649123 0.88596491 0.89956332 0.89082969] mean value: 0.878365126790776 key: test_roc_auc value: [0.94076923 0.80153846 0.88384615 0.84384615 0.86307692 0.86230769 0.84230769 0.80461538 0.8 0.86 ] mean value: 0.8502307692307692 key: train_roc_auc value: [0.87524898 0.87307516 0.8796541 0.88619474 0.87086302 0.87307516 0.85992684 0.87748027 0.88864629 0.88427948] mean value: 0.8768444035853827 key: test_jcc value: [0.88461538 0.62962963 0.8 0.73333333 0.75862069 0.76666667 0.74193548 0.66666667 0.6875 0.74074074] mean value: 0.7409708595178561 key: train_jcc value: [0.78076923 0.77606178 0.78515625 0.79766537 0.76771654 0.7734375 0.75097276 0.78294574 0.80155642 0.79377432] mean value: 0.7810055900293517 MCC on Blind test: 0.76 Accuracy on Blind test: 0.88 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.09083033 0.08632159 0.07055283 0.07104707 0.07519531 0.07939005 0.07854795 0.09603834 0.08671165 0.07549286] mean value: 0.08101279735565185 key: score_time value: [0.01125097 0.01128316 0.01079941 0.01076365 0.01111865 0.01134443 0.01275802 0.01227999 0.01084495 0.01092196] mean value: 0.011336517333984376 key: test_mcc value: [1. 1. 0.84307692 0.84307692 0.92153846 0.96153846 0.96148034 1. 0.88070485 0.76 ] mean value: 0.9171415955282479 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 1. 0.92156863 0.92156863 0.96078431 0.98039216 0.98039216 1. 0.94 0.88 ] mean value: 0.9584705882352941 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 1. 0.92 0.92 0.96153846 0.98039216 0.98113208 1. 0.93877551 0.88 ] mean value: 0.9581838204076987 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.92 0.92 0.96153846 1. 0.96296296 1. 0.95833333 0.88 ] mean value: 0.9602834757834758 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 0.92 0.92 0.96153846 0.96153846 1. 1. 0.92 0.88 ] mean value: 0.9563076923076923 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 1. 0.92153846 0.92153846 0.96076923 0.98076923 0.98 1. 0.94 0.88 ] mean value: 0.9584615384615385 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 1. 0.85185185 0.85185185 0.92592593 0.96153846 0.96296296 1. 0.88461538 0.78571429] mean value: 0.9224460724460725 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.86 Accuracy on Blind test: 0.93 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.06496072 0.06503797 0.07527375 0.07898951 0.1082499 0.08013272 0.07932186 0.04319263 0.06920362 0.0507133 ] mean value: 0.07150759696960449 key: score_time value: [0.01696396 0.01866412 0.01862216 0.01909184 0.03613448 0.01893544 0.01222348 0.01733971 0.01222539 0.01219845] mean value: 0.018239903450012206 key: test_mcc value: [0.88307692 0.80904133 0.80990051 0.76662339 0.80461538 0.72573276 0.72984534 0.76662339 0.76 0.60192927] mean value: 0.765738828717061 key: train_mcc value: [0.90817148 0.90830894 0.92560955 0.90426654 0.89956325 0.91693003 0.90386163 0.91250886 0.91710927 0.91703931] mean value: 0.9113368869349576 key: test_accuracy value: [0.94117647 0.90196078 0.90196078 0.88235294 0.90196078 0.8627451 0.8627451 0.88235294 0.88 0.8 ] mean value: 0.8817254901960785 key: train_accuracy value: [0.95404814 0.95404814 0.96280088 0.95185996 0.94967177 0.95842451 0.95185996 0.95623632 0.95851528 0.95851528] mean value: 0.9555980239458018 key: test_fscore value: [0.94117647 0.89361702 0.90566038 0.875 0.90196078 0.86792453 0.87272727 0.88888889 0.88 0.80769231] mean value: 0.8834647651147403 key: train_fscore value: [0.95444685 0.95464363 0.96296296 0.9527897 0.95010846 0.95860566 0.95217391 0.95633188 0.95878525 0.95860566] mean value: 0.9559453974783592 key: test_precision value: [0.92307692 0.95454545 0.85714286 0.91304348 0.92 0.85185185 0.82758621 0.85714286 0.88 0.77777778] mean value: 0.8762167406695143 key: train_precision value: [0.94827586 0.94444444 0.96086957 0.93670886 0.93991416 0.95238095 0.94396552 0.95217391 0.95258621 0.95652174] mean value: 0.948784122427322 key: test_recall value: [0.96 0.84 0.96 0.84 0.88461538 0.88461538 0.92307692 0.92307692 0.88 0.84 ] mean value: 0.8935384615384615 key: train_recall value: [0.96069869 0.9650655 0.9650655 0.96943231 0.96052632 0.96491228 0.96052632 0.96052632 0.9650655 0.96069869] mean value: 0.9632517428943538 key: test_roc_auc value: [0.94153846 0.90076923 0.90307692 0.88153846 0.90230769 0.86230769 0.86153846 0.88153846 0.88 0.8 ] mean value: 0.8814615384615384 key: train_roc_auc value: [0.95403356 0.95402398 0.96279591 0.95182142 0.94969547 0.95843867 0.95187888 0.95624569 0.95851528 0.95851528] mean value: 0.9555964146173294 key: test_jcc value: [0.88888889 0.80769231 0.82758621 0.77777778 0.82142857 0.76666667 0.77419355 0.8 0.78571429 0.67741935] mean value: 0.7927367608290856 key: train_jcc value: [0.91286307 0.91322314 0.92857143 0.90983607 0.90495868 0.92050209 0.90871369 0.91631799 0.92083333 0.92050209] mean value: 0.9156321584878045 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01775408 0.01069021 0.01090527 0.01043916 0.01051974 0.01007056 0.00997186 0.01019859 0.01107121 0.01059556] mean value: 0.011221623420715332 key: score_time value: [0.01215219 0.00924611 0.00937891 0.00882244 0.00891948 0.0093646 0.00921845 0.00940609 0.00914979 0.00949597] mean value: 0.00951540470123291 key: test_mcc value: [0.80431528 0.85322916 0.76733527 0.68615385 0.72615385 0.72573276 0.72573276 0.61648638 0.72057669 0.68887476] mean value: 0.73145907619674 key: train_mcc value: [0.72878597 0.70713347 0.73311115 0.77687755 0.68978499 0.7418496 0.7166604 0.71585171 0.76445458 0.74270515] mean value: 0.7317214561206101 key: test_accuracy value: [0.90196078 0.92156863 0.88235294 0.84313725 0.8627451 0.8627451 0.8627451 0.80392157 0.86 0.84 ] mean value: 0.8641176470588235 key: train_accuracy value: [0.8643326 0.85339168 0.86652079 0.88840263 0.84463895 0.87089716 0.85776805 0.85776805 0.88209607 0.87117904] mean value: 0.8656995021642954 key: test_fscore value: [0.89795918 0.91304348 0.88461538 0.84 0.8627451 0.86792453 0.86792453 0.79166667 0.8627451 0.82608696] mean value: 0.8614710922420334 key: train_fscore value: [0.86343612 0.85144124 0.86593407 0.88791209 0.84116331 0.86975717 0.85327314 0.85523385 0.88053097 0.8691796 ] mean value: 0.8637861569276664 key: test_precision value: [0.91666667 1. 0.85185185 0.84 0.88 0.85185185 0.85185185 0.86363636 0.84615385 0.9047619 ] mean value: 0.8806774336774337 key: train_precision value: [0.87111111 0.86486486 0.87168142 0.89380531 0.85844749 0.87555556 0.87906977 0.86877828 0.89237668 0.88288288] mean value: 0.8758573358261803 key: test_recall value: [0.88 0.84 0.92 0.84 0.84615385 0.88461538 0.88461538 0.73076923 0.88 0.76 ] mean value: 0.8466153846153845 key: train_recall value: [0.8558952 0.83842795 0.86026201 0.88209607 0.8245614 0.86403509 0.82894737 0.84210526 0.86899563 0.8558952 ] mean value: 0.8521221175208764 key: test_roc_auc value: [0.90153846 0.92 0.88307692 0.84307692 0.86307692 0.86230769 0.86230769 0.80538462 0.86 0.84 ] mean value: 0.8640769230769231 key: train_roc_auc value: [0.86435111 0.8534245 0.86653451 0.88841646 0.84459511 0.87088217 0.85770513 0.85773385 0.88209607 0.87117904] mean value: 0.8656917949896575 key: test_jcc value: [0.81481481 0.84 0.79310345 0.72413793 0.75862069 0.76666667 0.76666667 0.65517241 0.75862069 0.7037037 ] mean value: 0.7581507024265645 key: train_jcc value: [0.75968992 0.74131274 0.76356589 0.79841897 0.72586873 0.76953125 0.74409449 0.74708171 0.78656126 0.76862745] mean value: 0.7604752419520732 MCC on Blind test: 0.76 Accuracy on Blind test: 0.88 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01461196 0.01674128 0.02004933 0.01781726 0.01789832 0.02177954 0.01803398 0.0175209 0.01656842 0.02079844] mean value: 0.01818194389343262 key: score_time value: [0.01018262 0.01185966 0.01840734 0.01195478 0.01202202 0.01203704 0.01195407 0.01193404 0.01192832 0.01798701] mean value: 0.013026690483093262 key: test_mcc value: [0.80990051 0.81912621 0.82041265 0.76733527 0.78581168 0.73107432 0.80431528 0.78762135 0.76 0.6821865 ] mean value: 0.7767783786155215 key: train_mcc value: [0.80373177 0.8497961 0.8591878 0.81151328 0.70283343 0.88786716 0.85514592 0.74595689 0.84755764 0.83552208] mean value: 0.8199112066271259 key: test_accuracy value: [0.90196078 0.90196078 0.90196078 0.88235294 0.88235294 0.8627451 0.90196078 0.88235294 0.88 0.84 ] mean value: 0.8837647058823529 key: train_accuracy value: [0.89715536 0.92341357 0.92778993 0.90153173 0.83588621 0.94310722 0.92560175 0.85995624 0.92358079 0.91484716] mean value: 0.9052869960727357 key: test_fscore value: [0.90566038 0.88888889 0.90909091 0.88461538 0.89655172 0.85714286 0.90566038 0.86956522 0.88 0.84615385] mean value: 0.8843329582138102 key: train_fscore value: [0.90466531 0.92027335 0.93110647 0.90835031 0.85659656 0.94117647 0.92165899 0.83838384 0.92239468 0.91958763] mean value: 0.9064193601059057 key: test_precision value: [0.85714286 1. 0.83333333 0.85185185 0.8125 0.91304348 0.88888889 1. 0.88 0.81481481] mean value: 0.8851575224292616 key: train_precision value: [0.84469697 0.96190476 0.892 0.85114504 0.75932203 0.97196262 0.97087379 0.98809524 0.93693694 0.87109375] mean value: 0.9048031131930347 key: test_recall value: [0.96 0.8 1. 0.92 1. 0.80769231 0.92307692 0.76923077 0.88 0.88 ] mean value: 0.894 key: train_recall value: [0.97379913 0.88209607 0.97379913 0.97379913 0.98245614 0.9122807 0.87719298 0.72807018 0.90829694 0.97379913] mean value: 0.9185589519650655 key: test_roc_auc value: [0.90307692 0.9 0.90384615 0.88307692 0.88 0.86384615 0.90153846 0.88461538 0.88 0.84 ] mean value: 0.884 key: train_roc_auc value: [0.89698728 0.92350418 0.92768904 0.90137325 0.83620624 0.94303991 0.92549605 0.85966828 0.92358079 0.91484716] mean value: 0.9052392170382287 key: test_jcc value: [0.82758621 0.8 0.83333333 0.79310345 0.8125 0.75 0.82758621 0.76923077 0.78571429 0.73333333] mean value: 0.7932387583680687 key: train_jcc value: [0.82592593 0.85232068 0.87109375 0.83208955 0.74916388 0.88888889 0.85470085 0.72173913 0.85596708 0.85114504] mean value: 0.8303034773250645 MCC on Blind test: 0.76 Accuracy on Blind test: 0.88 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.0145123 0.02034068 0.02262974 0.02118397 0.01858521 0.01969814 0.02044988 0.02166414 0.02026725 0.01864958] mean value: 0.019798088073730468 key: score_time value: [0.01100039 0.01206136 0.01206374 0.01205349 0.01194215 0.0121634 0.01195788 0.01196051 0.0119977 0.01203704] mean value: 0.011923766136169434 key: test_mcc value: [0.80990051 0.84544958 0.85407434 0.73878883 0.80461538 0.61413747 0.84544958 0.88872671 0.76244374 0.64465837] mean value: 0.7808244519509732 key: train_mcc value: [0.87909672 0.86883646 0.90426654 0.76612696 0.85658732 0.8392754 0.89935264 0.87549121 0.88352087 0.82858789] mean value: 0.8601142007805195 key: test_accuracy value: [0.90196078 0.92156863 0.92156863 0.8627451 0.90196078 0.80392157 0.92156863 0.94117647 0.88 0.82 ] mean value: 0.8876470588235295 key: train_accuracy value: [0.93873085 0.93435449 0.95185996 0.87089716 0.92778993 0.91466083 0.94967177 0.93654267 0.94104803 0.91048035] mean value: 0.9276036042922802 key: test_fscore value: [0.90566038 0.91666667 0.92592593 0.84444444 0.90196078 0.82142857 0.92592593 0.93877551 0.88461538 0.83018868] mean value: 0.88955922701285 key: train_fscore value: [0.94067797 0.93506494 0.9527897 0.85286783 0.92933619 0.92057026 0.94967177 0.93394077 0.94267516 0.91615542] mean value: 0.9273750009738929 key: test_precision value: [0.85714286 0.95652174 0.86206897 0.95 0.92 0.76666667 0.89285714 1. 0.85185185 0.78571429] mean value: 0.8842823508880481 key: train_precision value: [0.91358025 0.92703863 0.93670886 0.99418605 0.90794979 0.85931559 0.94759825 0.97156398 0.91735537 0.86153846] mean value: 0.9236835228699787 key: test_recall value: [0.96 0.88 1. 0.76 0.88461538 0.88461538 0.96153846 0.88461538 0.92 0.88 ] mean value: 0.9015384615384615 key: train_recall value: [0.96943231 0.94323144 0.96943231 0.74672489 0.95175439 0.99122807 0.95175439 0.89912281 0.96943231 0.97816594] mean value: 0.9370278863096606 key: test_roc_auc value: [0.90307692 0.92076923 0.92307692 0.86076923 0.90230769 0.80230769 0.92076923 0.94230769 0.88 0.82 ] mean value: 0.8875384615384615 key: train_roc_auc value: [0.93866353 0.93433502 0.95182142 0.87116946 0.92784226 0.91482801 0.94967632 0.93646097 0.94104803 0.91048035] mean value: 0.9276325365816287 key: test_jcc value: [0.82758621 0.84615385 0.86206897 0.73076923 0.82142857 0.6969697 0.86206897 0.88461538 0.79310345 0.70967742] mean value: 0.8034441735498465 key: train_jcc value: [0.888 0.87804878 0.90983607 0.74347826 0.868 0.85283019 0.90416667 0.87606838 0.89156627 0.84528302] mean value: 0.8657277622273594 MCC on Blind test: 0.75 Accuracy on Blind test: 0.86 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.20065451 0.18761659 0.18804526 0.18726611 0.18589187 0.18081141 0.18000293 0.18104124 0.18397188 0.17888808] mean value: 0.18541898727416992 key: score_time value: [0.01699638 0.0164814 0.01690269 0.01632905 0.01682019 0.01586986 0.01592231 0.0159359 0.01681352 0.01568103] mean value: 0.016375231742858886 key: test_mcc value: [0.96153846 1. 0.88307692 0.84544958 0.96148034 0.92450033 0.96148034 1. 0.92 0.80064077] mean value: 0.9258166744058197 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98039216 1. 0.94117647 0.92156863 0.98039216 0.96078431 0.98039216 1. 0.96 0.9 ] mean value: 0.9624705882352941 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98039216 1. 0.94117647 0.91666667 0.98113208 0.96 0.98113208 1. 0.96 0.90196078] mean value: 0.9622460229374769 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96153846 1. 0.92307692 0.95652174 0.96296296 1. 0.96296296 1. 0.96 0.88461538] mean value: 0.961167843428713 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 0.96 0.88 1. 0.92307692 1. 1. 0.96 0.92 ] mean value: 0.9643076923076923 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98076923 1. 0.94153846 0.92076923 0.98 0.96153846 0.98 1. 0.96 0.9 ] mean value: 0.9624615384615385 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96153846 1. 0.88888889 0.84615385 0.96296296 0.92307692 0.96296296 1. 0.92307692 0.82142857] mean value: 0.929008954008954 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.95 Accuracy on Blind test: 0.98 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.06497097 0.07499814 0.08114505 0.0727787 0.07150054 0.08404374 0.08080506 0.05995893 0.06555724 0.06107831] mean value: 0.07168366909027099 key: score_time value: [0.01859593 0.03967738 0.02885842 0.04054928 0.02583647 0.03430438 0.03502369 0.02353096 0.0248487 0.02320862] mean value: 0.02944338321685791 key: test_mcc value: [1. 0.80904133 0.88872671 0.76662339 1. 0.96148034 0.96148034 1. 0.92 0.80064077] mean value: 0.9107992872362165 key: train_mcc value: [0.99124722 0.98688016 0.9956331 0.99128503 0.97812763 0.97812763 0.98249445 0.98695553 0.99126638 0.99126638] mean value: 0.9873283508508414 key: test_accuracy value: [1. 0.90196078 0.94117647 0.88235294 1. 0.98039216 0.98039216 1. 0.96 0.9 ] mean value: 0.9546274509803921 key: train_accuracy value: [0.99562363 0.99343545 0.99781182 0.99562363 0.98905908 0.98905908 0.99124726 0.99343545 0.99563319 0.99563319] mean value: 0.9936561780359856 key: test_fscore value: [1. 0.89361702 0.94339623 0.875 1. 0.98113208 0.98113208 1. 0.96 0.90196078] mean value: 0.9536238182948812 key: train_fscore value: [0.99563319 0.99346405 0.99782135 0.99565217 0.98905908 0.98905908 0.99122807 0.99337748 0.99563319 0.99563319] mean value: 0.9936560855826678 key: test_precision value: [1. 0.95454545 0.89285714 0.91304348 1. 0.96296296 0.96296296 1. 0.96 0.88461538] mean value: 0.9530987386204778 key: train_precision value: [0.99563319 0.99130435 0.99565217 0.99134199 0.98689956 0.98689956 0.99122807 1. 0.99563319 0.99563319] mean value: 0.9930225273212893 key: test_recall value: [1. 0.84 1. 0.84 1. 1. 1. 1. 0.96 0.92] mean value: 0.956 key: train_recall value: [0.99563319 0.99563319 1. 1. 0.99122807 0.99122807 0.99122807 0.98684211 0.99563319 0.99563319] mean value: 0.9943059066881177 key: test_roc_auc value: [1. 0.90076923 0.94230769 0.88153846 1. 0.98 0.98 1. 0.96 0.9 ] mean value: 0.9544615384615385 key: train_roc_auc value: [0.99562361 0.99343063 0.99780702 0.99561404 0.98906382 0.98906382 0.99124722 0.99342105 0.99563319 0.99563319] mean value: 0.9936537577568375 key: test_jcc value: [1. 0.80769231 0.89285714 0.77777778 1. 0.96296296 0.96296296 1. 0.92307692 0.82142857] mean value: 0.9148758648758649 key: train_jcc value: [0.99130435 0.98701299 0.99565217 0.99134199 0.97835498 0.97835498 0.9826087 0.98684211 0.99130435 0.99130435] mean value: 0.9874080953371571 MCC on Blind test: 0.9 Accuracy on Blind test: 0.95 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.1736908 0.16033888 0.1746788 0.15446329 0.17124557 0.15994787 0.13885593 0.10975528 0.1811161 0.09505463] mean value: 0.15191471576690674 key: score_time value: [0.02411532 0.02435613 0.01529169 0.02446175 0.02488351 0.02440786 0.01523852 0.02894855 0.02414536 0.0149796 ] mean value: 0.022082829475402833 key: test_mcc value: [0.80904133 0.60498161 0.75558816 0.76662339 0.69568237 0.52923077 0.76662339 0.52923077 0.56044854 0.60192927] mean value: 0.6619379576424396 key: train_mcc value: [0.99128536 0.98695627 0.98695627 0.98695627 0.98695553 0.99128503 0.98695553 0.98695553 0.98698426 0.99130418] mean value: 0.9882594241358311 key: test_accuracy value: [0.90196078 0.78431373 0.8627451 0.88235294 0.84313725 0.76470588 0.88235294 0.76470588 0.78 0.8 ] mean value: 0.8266274509803921 key: train_accuracy value: [0.99562363 0.99343545 0.99343545 0.99343545 0.99343545 0.99562363 0.99343545 0.99343545 0.99344978 0.99563319] mean value: 0.9940942925668638 key: test_fscore value: [0.89361702 0.73170732 0.87719298 0.875 0.83333333 0.76923077 0.88888889 0.76923077 0.78431373 0.79166667] mean value: 0.821418147364653 key: train_fscore value: [0.99561404 0.99340659 0.99340659 0.99340659 0.99337748 0.99559471 0.99337748 0.99337748 0.99340659 0.99561404] mean value: 0.9940581607789326 key: test_precision value: [0.95454545 0.9375 0.78125 0.91304348 0.90909091 0.76923077 0.85714286 0.76923077 0.76923077 0.82608696] mean value: 0.8486351963254137 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.84 0.6 1. 0.84 0.76923077 0.76923077 0.92307692 0.76923077 0.8 0.76 ] mean value: 0.8070769230769231 key: train_recall value: [0.99126638 0.98689956 0.98689956 0.98689956 0.98684211 0.99122807 0.98684211 0.98684211 0.98689956 0.99126638] mean value: 0.9881885390331724 key: test_roc_auc value: [0.90076923 0.78076923 0.86538462 0.88153846 0.84461538 0.76461538 0.88153846 0.76461538 0.78 0.8 ] mean value: 0.8263846153846154 key: train_roc_auc value: [0.99563319 0.99344978 0.99344978 0.99344978 0.99342105 0.99561404 0.99342105 0.99342105 0.99344978 0.99563319] mean value: 0.9940942695165862 key: test_jcc value: [0.80769231 0.57692308 0.78125 0.77777778 0.71428571 0.625 0.8 0.625 0.64516129 0.65517241] mean value: 0.7008262580794561 key: train_jcc value: [0.99126638 0.98689956 0.98689956 0.98689956 0.98684211 0.99122807 0.98684211 0.98684211 0.98689956 0.99126638] mean value: 0.9881885390331724 MCC on Blind test: 0.58 Accuracy on Blind test: 0.79 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.72235799 0.70944214 0.70883203 0.71152782 0.71902704 0.71618891 0.7211957 0.71571803 0.71437287 0.70587111] mean value: 0.7144533634185791 key: score_time value: [0.00964093 0.00959468 0.00950909 0.00943208 0.01016808 0.00956535 0.01000834 0.00979233 0.00962782 0.00996852] mean value: 0.009730720520019531 key: test_mcc value: [1. 0.85322916 0.88872671 0.84307692 0.92427578 0.96148034 1. 1. 0.88070485 0.80064077] mean value: 0.9152134526595563 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.92156863 0.94117647 0.92156863 0.96078431 0.98039216 1. 1. 0.94 0.9 ] mean value: 0.9565490196078431 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.91304348 0.94339623 0.92 0.96296296 0.98113208 1. 1. 0.93877551 0.90196078] mean value: 0.9561271037628433 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.89285714 0.92 0.92857143 0.96296296 1. 1. 0.95833333 0.88461538] mean value: 0.9547340252340253 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.84 1. 0.92 1. 1. 1. 1. 0.92 0.92] mean value: 0.96 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.92 0.94230769 0.92153846 0.96 0.98 1. 1. 0.94 0.9 ] mean value: 0.9563846153846154 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.84 0.89285714 0.85185185 0.92857143 0.96296296 1. 1. 0.88461538 0.82142857] mean value: 0.9182287342287342 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.86 Accuracy on Blind test: 0.93 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03209758 0.04663944 0.04156089 0.03064513 0.03051949 0.03812504 0.03040957 0.03081679 0.03012967 0.03015852] mean value: 0.034110212326049806 key: score_time value: [0.01268816 0.01741695 0.01369047 0.01662827 0.01425338 0.01279664 0.01463175 0.01472878 0.01473212 0.0146606 ] mean value: 0.014622712135314941 key: test_mcc value: [ 0.50162374 0.33282012 0.4779765 0.38593446 0.38074981 -0.08910647 0.67109832 0.41306141 0.52678658 0.42874646] mean value: 0.40296909364640227 key: train_mcc value: [0.89198163 0.97407901 0.94806064 0.63869807 0.73237152 0.51794578 0.96944796 0.93638281 0.93650904 0.94475499] mean value: 0.8490231449196334 key: test_accuracy value: [0.74509804 0.66666667 0.7254902 0.68627451 0.66666667 0.47058824 0.82352941 0.70588235 0.76 0.7 ] mean value: 0.6950196078431372 key: train_accuracy value: [0.94310722 0.9868709 0.97374179 0.78993435 0.84901532 0.71115974 0.98468271 0.96717724 0.96724891 0.97161572] mean value: 0.9144553906720304 key: test_fscore value: [0.76363636 0.65306122 0.75862069 0.71428571 0.73846154 0.59701493 0.84745763 0.72727273 0.77777778 0.74576271] mean value: 0.7323351299935275 key: train_fscore value: [0.94628099 0.98672566 0.97424893 0.8267148 0.86857143 0.7755102 0.98454746 0.96815287 0.96828753 0.97239915] mean value: 0.9271439021368936 key: test_precision value: [0.7 0.66666667 0.66666667 0.64516129 0.61538462 0.48780488 0.75757576 0.68965517 0.72413793 0.64705882] mean value: 0.6600111801642755 key: train_precision value: [0.89803922 1. 0.95780591 0.70461538 0.76767677 0.63333333 0.99111111 0.9382716 0.93852459 0.94628099] mean value: 0.877565890643361 key: test_recall value: [0.84 0.64 0.88 0.8 0.92307692 0.76923077 0.96153846 0.76923077 0.84 0.88 ] mean value: 0.8303076923076923 key: train_recall value: [1. 0.97379913 0.99126638 1. 1. 1. 0.97807018 1. 1. 1. ] mean value: 0.9943135677622003 key: test_roc_auc value: [0.74692308 0.66615385 0.72846154 0.68846154 0.66153846 0.46461538 0.82076923 0.70461538 0.76 0.7 ] mean value: 0.6941538461538461 key: train_roc_auc value: [0.94298246 0.98689956 0.97370336 0.78947368 0.84934498 0.71179039 0.98466828 0.96724891 0.96724891 0.97161572] mean value: 0.9144976250670344 key: test_jcc value: [0.61764706 0.48484848 0.61111111 0.55555556 0.58536585 0.42553191 0.73529412 0.57142857 0.63636364 0.59459459] mean value: 0.5817740898924696 key: train_jcc value: [0.89803922 0.97379913 0.94979079 0.70461538 0.76767677 0.63333333 0.96956522 0.9382716 0.93852459 0.94628099] mean value: 0.8719897027157442 MCC on Blind test: 0.45 Accuracy on Blind test: 0.69 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02209425 0.01628399 0.02629995 0.04093146 0.03840542 0.03341603 0.021245 0.02528167 0.02654409 0.03870654] mean value: 0.028920841217041016 key: score_time value: [0.02987242 0.01222944 0.01866579 0.01888323 0.01911664 0.01218963 0.01224184 0.01220989 0.01884508 0.01884842] mean value: 0.017310237884521483 key: test_mcc value: [0.8459178 0.92427578 0.85407434 0.84544958 0.80461538 0.80431528 0.84307692 0.80904133 0.76 0.68 ] mean value: 0.8170766413736327 key: train_mcc value: [0.86024417 0.8425731 0.86433893 0.8559713 0.84690379 0.87309431 0.84274962 0.86454544 0.8735707 0.86470302] mean value: 0.8588694380310397 key: test_accuracy value: [0.92156863 0.96078431 0.92156863 0.92156863 0.90196078 0.90196078 0.92156863 0.90196078 0.88 0.84 ] mean value: 0.9072941176470588 key: train_accuracy value: [0.92997812 0.92122538 0.9321663 0.92778993 0.92341357 0.93654267 0.92122538 0.9321663 0.93668122 0.93231441] mean value: 0.9293503291831099 key: test_fscore value: [0.92307692 0.95833333 0.92592593 0.91666667 0.90196078 0.90566038 0.92307692 0.90909091 0.88 0.84 ] mean value: 0.9083791842842898 key: train_fscore value: [0.93103448 0.92207792 0.93246187 0.92903226 0.92374728 0.93654267 0.92207792 0.93275488 0.93736501 0.93275488] mean value: 0.9299849177077446 key: test_precision value: [0.88888889 1. 0.86206897 0.95652174 0.92 0.88888889 0.92307692 0.86206897 0.88 0.84 ] mean value: 0.9021514371019619 key: train_precision value: [0.91914894 0.91416309 0.93043478 0.91525424 0.91774892 0.93449782 0.91025641 0.92274678 0.92735043 0.92672414] mean value: 0.9218325537192356 key: test_recall value: [0.96 0.92 1. 0.88 0.88461538 0.92307692 0.92307692 0.96153846 0.88 0.84 ] mean value: 0.9172307692307693 key: train_recall value: [0.94323144 0.930131 0.93449782 0.94323144 0.92982456 0.93859649 0.93421053 0.94298246 0.94759825 0.93886463] mean value: 0.9383168620240557 key: test_roc_auc value: [0.92230769 0.96 0.92307692 0.92076923 0.90230769 0.90153846 0.92153846 0.90076923 0.88 0.84 ] mean value: 0.9072307692307692 key: train_roc_auc value: [0.92994905 0.92120585 0.93216119 0.92775607 0.92342756 0.93654715 0.92125373 0.93218992 0.93668122 0.93231441] mean value: 0.929348617176128 key: test_jcc value: [0.85714286 0.92 0.86206897 0.84615385 0.82142857 0.82758621 0.85714286 0.83333333 0.78571429 0.72413793] mean value: 0.8334708854364027 key: train_jcc value: [0.87096774 0.85542169 0.87346939 0.86746988 0.8582996 0.88065844 0.85542169 0.87398374 0.88211382 0.87398374] mean value: 0.8691789714871334 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.21736836 0.28004289 0.27122521 0.16093874 0.27278447 0.15021992 0.3257041 0.41062498 0.28041005 0.27491283] mean value: 0.26442315578460696 key: score_time value: [0.01900768 0.01911998 0.01891971 0.01225471 0.01899242 0.01237059 0.02265048 0.02372003 0.01925325 0.02483606] mean value: 0.019112491607666017 key: test_mcc value: [0.8459178 0.92427578 0.80990051 0.84544958 0.80461538 0.80431528 0.84307692 0.80904133 0.76 0.68 ] mean value: 0.8126592592815737 key: train_mcc value: [0.86024417 0.8425731 0.91250886 0.8559713 0.84690379 0.87309431 0.84274962 0.86454544 0.8735707 0.86470302] mean value: 0.8636864306196921 key: test_accuracy value: [0.92156863 0.96078431 0.90196078 0.92156863 0.90196078 0.90196078 0.92156863 0.90196078 0.88 0.84 ] mean value: 0.9053333333333333 key: train_accuracy value: [0.92997812 0.92122538 0.95623632 0.92778993 0.92341357 0.93654267 0.92122538 0.9321663 0.93668122 0.93231441] mean value: 0.9317573313712937 key: test_fscore value: [0.92307692 0.95833333 0.90566038 0.91666667 0.90196078 0.90566038 0.92307692 0.90909091 0.88 0.84 ] mean value: 0.9063526294275461 key: train_fscore value:/home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:168: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:171: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.93103448 0.92207792 0.95614035 0.92903226 0.92374728 0.93654267 0.92207792 0.93275488 0.93736501 0.93275488] mean value: 0.9323527654316295 key: test_precision value: [0.88888889 1. 0.85714286 0.95652174 0.92 0.88888889 0.92307692 0.86206897 0.88 0.84 ] mean value: 0.9016588262645234 key: train_precision value: [0.91914894 0.91416309 0.96035242 0.91525424 0.91774892 0.93449782 0.91025641 0.92274678 0.92735043 0.92672414] mean value: 0.9248243177491149 key: test_recall value: [0.96 0.92 0.96 0.88 0.88461538 0.92307692 0.92307692 0.96153846 0.88 0.84 ] mean value: 0.9132307692307693 key: train_recall value: [0.94323144 0.930131 0.95196507 0.94323144 0.92982456 0.93859649 0.93421053 0.94298246 0.94759825 0.93886463] mean value: 0.9400635869148855 key: test_roc_auc value: [0.92230769 0.96 0.90307692 0.92076923 0.90230769 0.90153846 0.92153846 0.90076923 0.88 0.84 ] mean value: 0.9052307692307692 key: train_roc_auc value: [0.92994905 0.92120585 0.95624569 0.92775607 0.92342756 0.93654715 0.92125373 0.93218992 0.93668122 0.93231441] mean value: 0.9317570673408412 key: test_jcc value: [0.85714286 0.92 0.82758621 0.84615385 0.82142857 0.82758621 0.85714286 0.83333333 0.78571429 0.72413793] mean value: 0.8300226095743337 key: train_jcc value: [0.87096774 0.85542169 0.91596639 0.86746988 0.8582996 0.88065844 0.85542169 0.87398374 0.88211382 0.87398374] mean value: 0.8734286713670855 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03605151 0.03672171 0.03721905 0.03759956 0.03047872 0.03578138 0.03592539 0.036762 0.03830004 0.03602743] mean value: 0.03608667850494385 key: score_time value: [0.0120666 0.01405454 0.01406908 0.01425576 0.01207781 0.01208544 0.01441956 0.01217318 0.01456451 0.01270866] mean value: 0.013247513771057129 key: test_mcc value: [0.92704716 0.88746439 0.57735027 0.84866842 0.79056942 0.74466871 0.84615385 0.84615385 0.77849894 0.84866842] mean value: 0.8095243434058762 key: train_mcc value: [0.86365953 0.86787786 0.88530679 0.85535013 0.86386107 0.87262489 0.87284634 0.86395495 0.86815585 0.86386107] mean value: 0.867749849139245 key: test_accuracy value: [0.96226415 0.94339623 0.78846154 0.92307692 0.88461538 0.86538462 0.92307692 0.92307692 0.88461538 0.92307692] mean value: 0.9021044992743106 key: train_accuracy value: [0.93176972 0.93390192 0.94255319 0.92765957 0.93191489 0.93617021 0.93617021 0.93191489 0.93404255 0.93191489] mean value: 0.9338012067322959 key: test_fscore value: [0.96 0.94339623 0.79245283 0.92592593 0.89655172 0.85106383 0.92307692 0.92307692 0.89285714 0.92592593] mean value: 0.9034327451391779 key: train_fscore value: [0.93248945 0.93418259 0.94315789 0.9279661 0.93220339 0.93697479 0.93723849 0.93248945 0.93446089 0.93220339] mean value: 0.9343366440868982 key: test_precision value: [1. 0.96153846 0.77777778 0.89285714 0.8125 0.95238095 0.92307692 0.92307692 0.83333333 0.89285714] mean value: 0.8969398656898657 key: train_precision value: [0.92468619 0.92827004 0.93333333 0.92405063 0.92827004 0.9253112 0.9218107 0.92468619 0.92857143 0.92827004] mean value: 0.9267259809243651 key: test_recall value: [0.92307692 0.92592593 0.80769231 0.96153846 1. 0.76923077 0.92307692 0.92307692 0.96153846 0.96153846] mean value: 0.9156695156695157 key: train_recall value: [0.94042553 0.94017094 0.95319149 0.93191489 0.93617021 0.94893617 0.95319149 0.94042553 0.94042553 0.93617021] mean value: 0.9421022004000728 key: test_roc_auc value: [0.96153846 0.94373219 0.78846154 0.92307692 0.88461538 0.86538462 0.92307692 0.92307692 0.88461538 0.92307692] mean value: 0.9020655270655271 key: train_roc_auc value: [0.93175123 0.93391526 0.94255319 0.92765957 0.93191489 0.93617021 0.93617021 0.93191489 0.93404255 0.93191489] mean value: 0.9338006910347336 key: test_jcc value: [0.92307692 0.89285714 0.65625 0.86206897 0.8125 0.74074074 0.85714286 0.85714286 0.80645161 0.86206897] mean value: 0.8270300064898229 key: train_jcc value: [0.87351779 0.87649402 0.89243028 0.86561265 0.87301587 0.88142292 0.88188976 0.87351779 0.87698413 0.87301587] mean value: 0.8767901085829304 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.94098043 0.98036623 0.98186111 0.90004635 1.07260489 1.11815834 1.04107022 0.89132261 0.89728475 1.02238417] mean value: 0.9846079111099243 key: score_time value: [0.0144639 0.01220942 0.01224375 0.0165391 0.01481962 0.0146389 0.01722312 0.01506114 0.01477838 0.02237964] mean value: 0.01543569564819336 key: test_mcc value: [0.92704716 0.85164138 0.61538462 0.88527041 0.89056356 0.77849894 0.84615385 0.80829038 0.77849894 0.80829038] mean value: 0.8189639617732613 key: train_mcc value: [0.89778103 0.82535469 0.84262186 0.89790486 0.90641581 0.90233192 0.90252815 0.90651431 0.90220118 0.82571883] mean value: 0.8809372623148111 key: test_accuracy value: [0.96226415 0.9245283 0.80769231 0.94230769 0.94230769 0.88461538 0.92307692 0.90384615 0.88461538 0.90384615] mean value: 0.907910014513788 key: train_accuracy value: [0.94882729 0.91257996 0.9212766 0.94893617 0.95319149 0.95106383 0.95106383 0.95319149 0.95106383 0.91276596] mean value: 0.9403960440956313 key: test_fscore value: [0.96 0.92307692 0.80769231 0.94339623 0.94545455 0.875 0.92307692 0.90196078 0.89285714 0.90566038] mean value: 0.9078175230245152 key: train_fscore value: [0.94936709 0.91331924 0.9217759 0.94915254 0.95338983 0.95157895 0.95178197 0.9535865 0.95137421 0.91368421] mean value: 0.9409010432532758 key: test_precision value: [1. 0.96 0.80769231 0.92592593 0.89655172 0.95454545 0.92307692 0.92 0.83333333 0.88888889] mean value: 0.9110014557600765 key: train_precision value: [0.94142259 0.90376569 0.91596639 0.94514768 0.94936709 0.94166667 0.93801653 0.94560669 0.94537815 0.90416667] mean value: 0.9330504147086066 key: test_recall value: [0.92307692 0.88888889 0.80769231 0.96153846 1. 0.80769231 0.92307692 0.88461538 0.96153846 0.92307692] mean value: 0.9081196581196581 key: train_recall value: [0.95744681 0.92307692 0.92765957 0.95319149 0.95744681 0.96170213 0.96595745 0.96170213 0.95744681 0.92340426] mean value: 0.9489034369885434 key: test_roc_auc value: [0.96153846 0.92521368 0.80769231 0.94230769 0.94230769 0.88461538 0.92307692 0.90384615 0.88461538 0.90384615] mean value: 0.9079059829059829 key: train_roc_auc value: [0.94880887 0.91260229 0.9212766 0.94893617 0.95319149 0.95106383 0.95106383 0.95319149 0.95106383 0.91276596] mean value: 0.9403964357155847 key: test_jcc value: [0.92307692 0.85714286 0.67741935 0.89285714 0.89655172 0.77777778 0.85714286 0.82142857 0.80645161 0.82758621] mean value: 0.8337435028202548 key: train_jcc value: [0.90361446 0.84046693 0.85490196 0.90322581 0.91093117 0.90763052 0.908 0.91129032 0.90725806 0.84108527] mean value: 0.8888404505729317 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01424456 0.01084495 0.01039648 0.00999761 0.01003218 0.01008916 0.01021504 0.01027536 0.01029849 0.00998974] mean value: 0.01063835620880127 key: score_time value: [0.01386261 0.00955462 0.00925779 0.00907087 0.00900865 0.00900984 0.00913215 0.00919223 0.00887728 0.00897932] mean value: 0.009594535827636719 key: test_mcc value: [0.82552431 0.46464327 0.54494926 0.69230769 0.73131034 0.58789635 0.57735027 0.77151675 0.54006172 0.61538462] mean value: 0.6350944580966691 key: train_mcc value: [0.66639366 0.66929675 0.7289762 0.68358593 0.69117257 0.68473679 0.69424587 0.68473679 0.70419643 0.65795145] mean value: 0.6865292430336383 key: test_accuracy value: [0.90566038 0.71698113 0.76923077 0.84615385 0.86538462 0.78846154 0.78846154 0.88461538 0.76923077 0.80769231] mean value: 0.8141872278664731 key: train_accuracy value: [0.8315565 0.8336887 0.86170213 0.84042553 0.84468085 0.84042553 0.84680851 0.84042553 0.85106383 0.82765957] mean value: 0.8418436691920338 key: test_fscore value: [0.89361702 0.66666667 0.78571429 0.84615385 0.86792453 0.76595745 0.78431373 0.88 0.76 0.80769231] mean value: 0.8058039828104295 key: train_fscore value: [0.82326622 0.82666667 0.85260771 0.83296214 0.8388521 0.83146067 0.85 0.83146067 0.84513274 0.81959911] mean value: 0.8352008031680325 key: test_precision value: [1. 0.83333333 0.73333333 0.84615385 0.85185185 0.85714286 0.8 0.91666667 0.79166667 0.80769231] mean value: 0.8437840862840863 key: train_precision value: [0.86792453 0.86111111 0.91262136 0.87383178 0.87155963 0.88095238 0.83265306 0.88095238 0.88018433 0.85981308] mean value: 0.8721603646403393 key: test_recall value: [0.80769231 0.55555556 0.84615385 0.84615385 0.88461538 0.69230769 0.76923077 0.84615385 0.73076923 0.80769231] mean value: 0.7786324786324786 key: train_recall value: [0.78297872 0.79487179 0.8 0.79574468 0.80851064 0.78723404 0.86808511 0.78723404 0.81276596 0.78297872] mean value: 0.8020403709765412 key: test_roc_auc value: [0.90384615 0.72008547 0.76923077 0.84615385 0.86538462 0.78846154 0.78846154 0.88461538 0.76923077 0.80769231] mean value: 0.8143162393162393 key: train_roc_auc value: [0.8316603 0.83360611 0.86170213 0.84042553 0.84468085 0.84042553 0.84680851 0.84042553 0.85106383 0.82765957] mean value: 0.8418457901436625 key: test_jcc value: [0.80769231 0.5 0.64705882 0.73333333 0.76666667 0.62068966 0.64516129 0.78571429 0.61290323 0.67741935] mean value: 0.6796638943076161 key: train_jcc value: [0.69961977 0.70454545 0.743083 0.71374046 0.72243346 0.71153846 0.73913043 0.71153846 0.73180077 0.69433962] mean value: 0.717176989523702 MCC on Blind test: 0.63 Accuracy on Blind test: 0.81 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01102686 0.01034999 0.01119971 0.01043487 0.0116086 0.0105567 0.01021433 0.01092291 0.01056623 0.01027584] mean value: 0.010715603828430176 key: score_time value: [0.00961232 0.00919366 0.00915337 0.00907326 0.00980663 0.00913072 0.00897837 0.0090673 0.00913048 0.00903487] mean value: 0.009218096733093262 key: test_mcc value: [0.92704716 0.59688314 0.50336201 0.73131034 0.70064905 0.69436507 0.57735027 0.76923077 0.66628253 0.65433031] mean value: 0.6820810650750807 key: train_mcc value: [0.73987525 0.70625194 0.78298581 0.71521098 0.7745312 0.74043224 0.69894261 0.74470782 0.76195052 0.74048587] mean value: 0.7405374249725031 key: test_accuracy value: [0.96226415 0.79245283 0.75 0.86538462 0.84615385 0.84615385 0.78846154 0.88461538 0.82692308 0.82692308] mean value: 0.8389332365747459 key: train_accuracy value: [0.86993603 0.85287846 0.89148936 0.85744681 0.88723404 0.87021277 0.84893617 0.87234043 0.88085106 0.87021277] mean value: 0.8701537903189221 key: test_fscore value: [0.96 0.7755102 0.76363636 0.86792453 0.85714286 0.84 0.78431373 0.88461538 0.84210526 0.82352941] mean value: 0.8398777738190921 key: train_fscore value: [0.87048832 0.8496732 0.89171975 0.85529158 0.88794926 0.87048832 0.84463895 0.87179487 0.87931034 0.86937901] mean value: 0.8690733611272227 key: test_precision value: [1. 0.86363636 0.72413793 0.85185185 0.8 0.875 0.8 0.88461538 0.77419355 0.84 ] mean value: 0.841343507952518 key: train_precision value: [0.86864407 0.86666667 0.88983051 0.86842105 0.88235294 0.86864407 0.86936937 0.87553648 0.89082969 0.875 ] mean value: 0.8755294848921722 key: test_recall value: [0.92307692 0.7037037 0.80769231 0.88461538 0.92307692 0.80769231 0.76923077 0.88461538 0.92307692 0.80769231] mean value: 0.8434472934472934 key: train_recall value: [0.87234043 0.83333333 0.89361702 0.84255319 0.89361702 0.87234043 0.8212766 0.86808511 0.86808511 0.86382979] mean value: 0.8629078014184397 key: test_roc_auc value: [0.96153846 0.79415954 0.75 0.86538462 0.84615385 0.84615385 0.78846154 0.88461538 0.82692308 0.82692308] mean value: 0.8390313390313391 key: train_roc_auc value: [0.8699309 0.85283688 0.89148936 0.85744681 0.88723404 0.87021277 0.84893617 0.87234043 0.88085106 0.87021277] mean value: 0.8701491180214584 key: test_jcc value: [0.92307692 0.63333333 0.61764706 0.76666667 0.75 0.72413793 0.64516129 0.79310345 0.72727273 0.7 ] mean value: 0.7280399378806105 key: train_jcc value: [0.77067669 0.73863636 0.8045977 0.74716981 0.79847909 0.77067669 0.73106061 0.77272727 0.78461538 0.76893939] mean value: 0.7687579004360319 MCC on Blind test: 0.76 Accuracy on Blind test: 0.88 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00984716 0.01065707 0.01068068 0.01061821 0.01073027 0.0107739 0.01069117 0.01094604 0.01067162 0.01062965] mean value: 0.010624575614929199 key: score_time value: [0.01390767 0.01305842 0.01274514 0.01643538 0.01310968 0.01333404 0.01317954 0.01302528 0.01261353 0.01284766] mean value: 0.013425636291503906 key: test_mcc value: [0.63760132 0.52028554 0.23145502 0.38575837 0.70064905 0.54006172 0.31139958 0.77151675 0.73568294 0.46291005] mean value: 0.5297320348876765 key: train_mcc value: [0.72710906 0.7106402 0.76601987 0.71495188 0.73197454 0.71066404 0.70669657 0.66043608 0.71980093 0.71490009] mean value: 0.7163193248281741 key: test_accuracy value: [0.81132075 0.75471698 0.61538462 0.69230769 0.84615385 0.76923077 0.65384615 0.88461538 0.86538462 0.73076923] mean value: 0.7623730043541365 key: train_accuracy value: [0.86353945 0.85501066 0.88297872 0.85744681 0.86595745 0.85531915 0.85319149 0.82978723 0.85957447 0.85744681] mean value: 0.8580252234269382 key: test_fscore value: [0.7826087 0.73469388 0.6 0.68 0.85714286 0.76 0.625 0.88888889 0.85714286 0.72 ] mean value: 0.7505477176377797 key: train_fscore value: [0.86324786 0.85152838 0.88222698 0.85653105 0.86509636 0.8559322 0.85097192 0.82532751 0.85652174 0.85714286] mean value: 0.856452687007534 key: test_precision value: [0.9 0.81818182 0.625 0.70833333 0.8 0.79166667 0.68181818 0.85714286 0.91304348 0.75 ] mean value: 0.7845186335403727 key: train_precision value: [0.86695279 0.87053571 0.88793103 0.86206897 0.87068966 0.85232068 0.86403509 0.84753363 0.87555556 0.85897436] mean value: 0.8656597468799392 key: test_recall value: [0.69230769 0.66666667 0.57692308 0.65384615 0.92307692 0.73076923 0.57692308 0.92307692 0.80769231 0.69230769] mean value: 0.7243589743589743 key: train_recall value: [0.85957447 0.83333333 0.87659574 0.85106383 0.85957447 0.85957447 0.83829787 0.80425532 0.83829787 0.85531915] mean value: 0.8475886524822696 key: test_roc_auc value: [0.80911681 0.75641026 0.61538462 0.69230769 0.84615385 0.76923077 0.65384615 0.88461538 0.86538462 0.73076923] mean value: 0.7623219373219373 key: train_roc_auc value: [0.86354792 0.85496454 0.88297872 0.85744681 0.86595745 0.85531915 0.85319149 0.82978723 0.85957447 0.85744681] mean value: 0.8580214584469904 key: test_jcc value: [0.64285714 0.58064516 0.42857143 0.51515152 0.75 0.61290323 0.45454545 0.8 0.75 0.5625 ] mean value: 0.6097173928222316 key: train_jcc value: [0.7593985 0.74144487 0.78927203 0.74906367 0.76226415 0.74814815 0.7406015 0.70260223 0.74904943 0.75 ] mean value: 0.7491844527216088 MCC on Blind test: 0.33 Accuracy on Blind test: 0.67 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02203298 0.02313685 0.02299261 0.02083182 0.02209663 0.01996708 0.02413058 0.02186084 0.02108955 0.0205934 ] mean value: 0.02187323570251465 key: score_time value: [0.01181936 0.01257968 0.01236892 0.01224875 0.01212788 0.01159978 0.01305223 0.0125947 0.01200342 0.01176119] mean value: 0.012215590476989746 key: test_mcc value: [0.92704716 0.81688878 0.61538462 0.88527041 0.74466871 0.77849894 0.80829038 0.84615385 0.74466871 0.80829038] mean value: 0.7975161942581178 key: train_mcc value: [0.78688615 0.79976356 0.82130634 0.79149653 0.80857653 0.80035515 0.8000652 0.79155386 0.80857653 0.80000724] mean value: 0.8008587091662202 key: test_accuracy value: [0.96226415 0.90566038 0.80769231 0.94230769 0.86538462 0.88461538 0.90384615 0.92307692 0.86538462 0.90384615] mean value: 0.8964078374455733 key: train_accuracy value: [0.89339019 0.89978678 0.9106383 0.89574468 0.90425532 0.9 0.9 0.89574468 0.90425532 0.9 ] mean value: 0.900381527015379 key: test_fscore value: [0.96 0.90196078 0.80769231 0.94339623 0.87719298 0.875 0.90196078 0.92307692 0.87719298 0.90566038] mean value: 0.8973133368082548 key: train_fscore value: [0.89451477 0.90063425 0.91101695 0.89596603 0.90364026 0.90146751 0.90063425 0.89640592 0.90364026 0.90021231] mean value: 0.9008132498798447 key: test_precision value: [1. 0.95833333 0.80769231 0.92592593 0.80645161 0.95454545 0.92 0.92307692 0.80645161 0.88888889] mean value: 0.8991366059269286 key: train_precision value: [0.88702929 0.89121339 0.907173 0.8940678 0.90948276 0.88842975 0.89495798 0.8907563 0.90948276 0.89830508] mean value: 0.8970898109982571 key: test_recall value: [0.92307692 0.85185185 0.80769231 0.96153846 0.96153846 0.80769231 0.88461538 0.92307692 0.96153846 0.92307692] mean value: 0.9005698005698006 key: train_recall value: [0.90212766 0.91025641 0.91489362 0.89787234 0.89787234 0.91489362 0.90638298 0.90212766 0.89787234 0.90212766] mean value: 0.9046426623022368 key: test_roc_auc value: [0.96153846 0.90669516 0.80769231 0.94230769 0.86538462 0.88461538 0.90384615 0.92307692 0.86538462 0.90384615] mean value: 0.8964387464387464 key: train_roc_auc value: [0.89337152 0.89980906 0.9106383 0.89574468 0.90425532 0.9 0.9 0.89574468 0.90425532 0.9 ] mean value: 0.9003818876159301 key: test_jcc value: [0.92307692 0.82142857 0.67741935 0.89285714 0.78125 0.77777778 0.82142857 0.85714286 0.78125 0.82758621] mean value: 0.8161217405447105 key: train_jcc value: [0.80916031 0.81923077 0.83657588 0.81153846 0.82421875 0.82061069 0.81923077 0.81226054 0.82421875 0.81853282] mean value: 0.819557772278408 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [2.02500224 2.07535386 2.03847885 2.11725354 2.08513308 2.02912474 1.15581083 2.13639879 2.00898528 2.09049392] mean value: 1.9762035131454467 key: score_time value: [0.01247501 0.01451373 0.01420856 0.01249385 0.01248074 0.02131391 0.01252007 0.02289152 0.01492286 0.01478481] mean value: 0.01526050567626953 key: test_mcc value: [0.92704716 0.77350427 0.54006172 0.82305489 0.9258201 0.77849894 0.80829038 0.88527041 0.77151675 0.73131034] mean value: 0.7964374979189948 key: train_mcc value: [1. 0.99150739 1. 1. 0.99148936 0.9957537 0.95320012 1. 0.9957537 1. ] mean value: 0.9927704266438734 key: test_accuracy value: [0.96226415 0.88679245 0.76923077 0.90384615 0.96153846 0.88461538 0.90384615 0.94230769 0.88461538 0.86538462] mean value: 0.89644412191582 key: train_accuracy value: [1. 0.99573561 1. 1. 0.99574468 0.99787234 0.97659574 1. 0.99787234 1. ] mean value: 0.9963820714058885 key: test_fscore value: [0.96 0.88888889 0.77777778 0.9122807 0.96296296 0.875 0.90196078 0.94117647 0.88888889 0.86792453] mean value: 0.8976861003476753 key: train_fscore value: [1. 0.99574468 1. 1. 0.99574468 0.9978678 0.97664544 1. 0.99787686 1. ] mean value: 0.9963879458533711 key: test_precision value: [1. 0.88888889 0.75 0.83870968 0.92857143 0.95454545 0.92 0.96 0.85714286 0.85185185] mean value: 0.8949710158419836 key: train_precision value: [1. 0.99152542 1. 1. 0.99574468 1. 0.97457627 1. 0.99576271 1. ] mean value: 0.9957609087630724 key: test_recall value: [0.92307692 0.88888889 0.80769231 1. 1. 0.80769231 0.88461538 0.92307692 0.92307692 0.88461538] mean value: 0.9042735042735043 key: train_recall value: [1. 1. 1. 1. 0.99574468 0.99574468 0.9787234 1. 1. 1. ] mean value: 0.9970212765957447 key: test_roc_auc value: [0.96153846 0.88675214 0.76923077 0.90384615 0.96153846 0.88461538 0.90384615 0.94230769 0.88461538 0.86538462] mean value: 0.8963675213675214 key: train_roc_auc value: [1. 0.99574468 1. 1. 0.99574468 0.99787234 0.97659574 1. 0.99787234 1. ] mean value: 0.9963829787234042 key: test_jcc value: [0.92307692 0.8 0.63636364 0.83870968 0.92857143 0.77777778 0.82142857 0.88888889 0.8 0.76666667] mean value: 0.8181483570193248 key: train_jcc value: [1. 0.99152542 1. 1. 0.99152542 0.99574468 0.95435685 1. 0.99576271 1. ] mean value: 0.9928915086646127 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02680755 0.02203512 0.02136636 0.02250838 0.02036333 0.02217436 0.02208591 0.02092457 0.02369666 0.02297091] mean value: 0.022493314743041993 key: score_time value: [0.01222968 0.00935984 0.00914311 0.00899458 0.0090909 0.00908637 0.00928926 0.00905418 0.0090971 0.00970721] mean value: 0.009505224227905274 key: test_mcc value: [0.85164138 0.92450142 0.77849894 0.88527041 0.88527041 0.88527041 0.84615385 0.88527041 0.96225045 0.84866842] mean value: 0.8752796119384096 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9245283 0.96226415 0.88461538 0.94230769 0.94230769 0.94230769 0.92307692 0.94230769 0.98076923 0.92307692] mean value: 0.9367561683599419 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92592593 0.96296296 0.89285714 0.94339623 0.94339623 0.94117647 0.92307692 0.94339623 0.98113208 0.92592593] mean value: 0.9383246106054097 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.89285714 0.96296296 0.83333333 0.92592593 0.92592593 0.96 0.92307692 0.92592593 0.96296296 0.89285714] mean value: 0.9205828245828246 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96153846 0.96296296 0.96153846 0.96153846 0.96153846 0.92307692 0.92307692 0.96153846 1. 0.96153846] mean value: 0.9578347578347579 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.92521368 0.96225071 0.88461538 0.94230769 0.94230769 0.94230769 0.92307692 0.94230769 0.98076923 0.92307692] mean value: 0.9368233618233619 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.86206897 0.92857143 0.80645161 0.89285714 0.89285714 0.88888889 0.85714286 0.89285714 0.96296296 0.86206897] mean value: 0.8846727110075274 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.87 Accuracy on Blind test: 0.93 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.12310386 0.12195277 0.1237545 0.12339377 0.12358236 0.12080479 0.12089777 0.12545443 0.12270761 0.12189317] mean value: 0.12275450229644776 key: score_time value: [0.01917434 0.01886702 0.01860046 0.01904464 0.01814556 0.01823425 0.0187676 0.018188 0.01808381 0.01885533] mean value: 0.01859610080718994 key: test_mcc value: [0.85122386 0.70692282 0.50336201 0.88527041 0.85634884 0.81312325 0.84615385 0.88527041 0.82305489 0.89056356] mean value: 0.8061293898911333 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9245283 0.8490566 0.75 0.94230769 0.92307692 0.90384615 0.92307692 0.94230769 0.90384615 0.94230769] mean value: 0.9004354136429609 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92 0.84 0.76363636 0.94339623 0.92857143 0.89795918 0.92307692 0.94339623 0.9122807 0.94545455] mean value: 0.9017771598997305 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95833333 0.91304348 0.72413793 0.92592593 0.86666667 0.95652174 0.92307692 0.92592593 0.83870968 0.89655172] mean value: 0.8928893324911849 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88461538 0.77777778 0.80769231 0.96153846 1. 0.84615385 0.92307692 0.96153846 1. 1. ] mean value: 0.9162393162393162 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.92378917 0.85042735 0.75 0.94230769 0.92307692 0.90384615 0.92307692 0.94230769 0.90384615 0.94230769] mean value: 0.9004985754985755 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.85185185 0.72413793 0.61764706 0.89285714 0.86666667 0.81481481 0.85714286 0.89285714 0.83870968 0.89655172] mean value: 0.8253236867605774 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.81 Accuracy on Blind test: 0.9 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.0104301 0.01049423 0.01046538 0.01045418 0.01051521 0.01134562 0.01081777 0.01171875 0.01168561 0.01107073] mean value: 0.010899758338928223 key: score_time value: [0.00904679 0.00917101 0.00891232 0.0090096 0.00909114 0.00902176 0.00980735 0.00962543 0.00923777 0.00924516] mean value: 0.009216833114624023 key: test_mcc value: [0.69957726 0.44368795 0.38575837 0.55339859 0.70064905 0.54494926 0.43112399 0.57735027 0.34641016 0.34848139] mean value: 0.5031386296029412 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8490566 0.71698113 0.69230769 0.76923077 0.84615385 0.76923077 0.71153846 0.78846154 0.67307692 0.67307692] mean value: 0.7489114658925979 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.84 0.69387755 0.7037037 0.73913043 0.85714286 0.75 0.68085106 0.79245283 0.66666667 0.69090909] mean value: 0.7414734198243802 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.875 0.77272727 0.67857143 0.85 0.8 0.81818182 0.76190476 0.77777778 0.68 0.65517241] mean value: 0.7669335472956162 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.80769231 0.62962963 0.73076923 0.65384615 0.92307692 0.69230769 0.61538462 0.80769231 0.65384615 0.73076923] mean value: 0.7245014245014245 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8482906 0.71866097 0.69230769 0.76923077 0.84615385 0.76923077 0.71153846 0.78846154 0.67307692 0.67307692] mean value: 0.749002849002849 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.72413793 0.53125 0.54285714 0.5862069 0.75 0.6 0.51612903 0.65625 0.5 0.52777778] mean value: 0.5934608780479191 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.58 Accuracy on Blind test: 0.79 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.85724831 1.80352688 1.76003718 1.77659369 1.78128886 1.81115103 1.7989316 1.79322004 1.84129739 1.78404427] mean value: 1.8007339239120483 key: score_time value: [0.09947538 0.09356856 0.09668612 0.09286475 0.10176277 0.10127997 0.10069108 0.09549236 0.09286833 0.10066128] mean value: 0.09753506183624268 key: test_mcc value: [0.92450142 0.88746439 0.84615385 0.88527041 0.96225045 0.9258201 0.88527041 0.92307692 0.9258201 1. ] mean value: 0.9165628054905913 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96226415 0.94339623 0.92307692 0.94230769 0.98076923 0.96153846 0.94230769 0.96153846 0.96153846 1. ] mean value: 0.9578737300435414 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96153846 0.94339623 0.92307692 0.94339623 0.98113208 0.96 0.94117647 0.96153846 0.96296296 1. ] mean value: 0.9578217808006931 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96153846 0.96153846 0.92307692 0.92592593 0.96296296 1. 0.96 0.96153846 0.92857143 1. ] mean value: 0.9585152625152625 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96153846 0.92592593 0.92307692 0.96153846 1. 0.92307692 0.92307692 0.96153846 1. 1. ] mean value: 0.957977207977208 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96225071 0.94373219 0.92307692 0.94230769 0.98076923 0.96153846 0.94230769 0.96153846 0.96153846 1. ] mean value: 0.957905982905983 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.92592593 0.89285714 0.85714286 0.89285714 0.96296296 0.92307692 0.88888889 0.92592593 0.92857143 1. ] mean value: 0.9198209198209198 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.81 Accuracy on Blind test: 0.9 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.93789101 0.9519825 0.9610076 0.95415044 1.06602931 0.96349025 0.95963335 1.00183535 0.93617368 0.97349167] mean value: 0.9705685138702392 key: score_time value: [0.27528143 0.22235203 0.27172637 0.26692533 0.2285161 0.21771598 0.26816726 0.23777795 0.24989247 0.26056457] mean value: 0.24989194869995118 key: test_mcc value: [0.92450142 0.78307508 0.80829038 0.88527041 0.9258201 0.9258201 0.88527041 0.92307692 0.9258201 1. ] mean value: 0.8986944924445692 key: train_mcc value: [0.95309971 0.94457073 0.95744681 0.95326917 0.94893617 0.95320012 0.96171083 0.95748148 0.94893617 0.94897054] mean value: 0.9527621739445413 key: test_accuracy value: [0.96226415 0.88679245 0.90384615 0.94230769 0.96153846 0.96153846 0.94230769 0.96153846 0.96153846 1. ] mean value: 0.948367198838897 key: train_accuracy value: [0.97654584 0.97228145 0.9787234 0.97659574 0.97446809 0.97659574 0.98085106 0.9787234 0.97446809 0.97446809] mean value: 0.9763720909132151 key: test_fscore value: [0.96153846 0.88 0.90566038 0.94339623 0.96296296 0.96 0.94117647 0.96153846 0.96296296 1. ] mean value: 0.947923592336467 key: train_fscore value: [0.97664544 0.97216274 0.9787234 0.9764454 0.97446809 0.97654584 0.98081023 0.97863248 0.97446809 0.97435897] mean value: 0.9763260676507729 key: test_precision value: [0.96153846 0.95652174 0.88888889 0.92592593 0.92857143 1. 0.96 0.96153846 0.92857143 1. ] mean value: 0.951155633416503 key: train_precision value: [0.97457627 0.97424893 0.9787234 0.98275862 0.97446809 0.97863248 0.98290598 0.98283262 0.97446809 0.97854077] mean value: 0.9782155245479209 key: test_recall value: [0.96153846 0.81481481 0.92307692 0.96153846 1. 0.92307692 0.92307692 0.96153846 1. 1. ] mean value: 0.9468660968660969 key: train_recall value: [0.9787234 0.97008547 0.9787234 0.97021277 0.97446809 0.97446809 0.9787234 0.97446809 0.97446809 0.97021277] mean value: 0.9744553555191853 key: test_roc_auc value: [0.96225071 0.88817664 0.90384615 0.94230769 0.96153846 0.96153846 0.94230769 0.96153846 0.96153846 1. ] mean value: 0.9485042735042736 key: train_roc_auc value: [0.97654119 0.97227678 0.9787234 0.97659574 0.97446809 0.97659574 0.98085106 0.9787234 0.97446809 0.97446809] mean value: 0.9763711583924349 key: test_jcc value: [0.92592593 0.78571429 0.82758621 0.89285714 0.92857143 0.92307692 0.88888889 0.92592593 0.92857143 1. ] mean value: 0.9027118156428502 key: train_jcc value: [0.95435685 0.94583333 0.95833333 0.9539749 0.95020747 0.95416667 0.9623431 0.958159 0.95020747 0.95 ] mean value: 0.9537582105013397 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02532554 0.01064587 0.01129389 0.0115211 0.01132441 0.0113101 0.01137424 0.0111444 0.01132274 0.01128101] mean value: 0.012654328346252441 key: score_time value: [0.01154637 0.00929785 0.00982523 0.00981331 0.00931716 0.00963163 0.00972509 0.00960636 0.00930762 0.00981116] mean value: 0.009788179397583007 key: test_mcc value: [0.92704716 0.59688314 0.50336201 0.73131034 0.70064905 0.69436507 0.57735027 0.76923077 0.66628253 0.65433031] mean value: 0.6820810650750807 key: train_mcc value: [0.73987525 0.70625194 0.78298581 0.71521098 0.7745312 0.74043224 0.69894261 0.74470782 0.76195052 0.74048587] mean value: 0.7405374249725031 key: test_accuracy value: [0.96226415 0.79245283 0.75 0.86538462 0.84615385 0.84615385 0.78846154 0.88461538 0.82692308 0.82692308] mean value: 0.8389332365747459 key: train_accuracy value: [0.86993603 0.85287846 0.89148936 0.85744681 0.88723404 0.87021277 0.84893617 0.87234043 0.88085106 0.87021277] mean value: 0.8701537903189221 key: test_fscore value: [0.96 0.7755102 0.76363636 0.86792453 0.85714286 0.84 0.78431373 0.88461538 0.84210526 0.82352941] mean value: 0.8398777738190921 key: train_fscore value: [0.87048832 0.8496732 0.89171975 0.85529158 0.88794926 0.87048832 0.84463895 0.87179487 0.87931034 0.86937901] mean value: 0.8690733611272227 key: test_precision value: [1. 0.86363636 0.72413793 0.85185185 0.8 0.875 0.8 0.88461538 0.77419355 0.84 ] mean value: 0.841343507952518 key: train_precision value: [0.86864407 0.86666667 0.88983051 0.86842105 0.88235294 0.86864407 0.86936937 0.87553648 0.89082969 0.875 ] mean value: 0.8755294848921722 key: test_recall value: [0.92307692 0.7037037 0.80769231 0.88461538 0.92307692 0.80769231 0.76923077 0.88461538 0.92307692 0.80769231] mean value: 0.8434472934472934 key: train_recall value: [0.87234043 0.83333333 0.89361702 0.84255319 0.89361702 0.87234043 0.8212766 0.86808511 0.86808511 0.86382979] mean value: 0.8629078014184397 key: test_roc_auc value: [0.96153846 0.79415954 0.75 0.86538462 0.84615385 0.84615385 0.78846154 0.88461538 0.82692308 0.82692308] mean value: 0.8390313390313391 key: train_roc_auc value: [0.8699309 0.85283688 0.89148936 0.85744681 0.88723404 0.87021277 0.84893617 0.87234043 0.88085106 0.87021277] mean value: 0.8701491180214584 key: test_jcc value: [0.92307692 0.63333333 0.61764706 0.76666667 0.75 0.72413793 0.64516129 0.79310345 0.72727273 0.7 ] mean value: 0.7280399378806105 key: train_jcc value: [0.77067669 0.73863636 0.8045977 0.74716981 0.79847909 0.77067669 0.73106061 0.77272727 0.78461538 0.76893939] mean value: 0.7687579004360319 MCC on Blind test: 0.76 Accuracy on Blind test: 0.88 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.10077667 0.09779096 0.08207917 0.07680631 0.07786369 0.07240462 0.07249331 0.07102299 0.07772779 0.081285 ] mean value: 0.08102505207061768 key: score_time value: [0.0129652 0.01151061 0.01169777 0.01227379 0.01141953 0.01082182 0.01121521 0.01066947 0.01165438 0.01114535] mean value: 0.011537313461303711 key: test_mcc value: [0.85164138 1. 0.84866842 0.9258201 0.96225045 0.9258201 0.88527041 0.88527041 0.9258201 1. ] mean value: 0.9210561378370106 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9245283 1. 0.92307692 0.96153846 0.98076923 0.96153846 0.94230769 0.94230769 0.96153846 1. ] mean value: 0.9597605224963716 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92592593 1. 0.92592593 0.96296296 0.98113208 0.96 0.94117647 0.94117647 0.96296296 1. ] mean value: 0.9601262794425947 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.89285714 1. 0.89285714 0.92857143 0.96296296 1. 0.96 0.96 0.92857143 1. ] mean value: 0.9525820105820106 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96153846 1. 0.96153846 1. 1. 0.92307692 0.92307692 0.92307692 1. 1. ] mean value: 0.9692307692307692 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.92521368 1. 0.92307692 0.96153846 0.98076923 0.96153846 0.94230769 0.94230769 0.96153846 1. ] mean value: 0.9598290598290599 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.86206897 1. 0.86206897 0.92857143 0.96296296 0.92307692 0.88888889 0.88888889 0.92857143 1. ] mean value: 0.9245098451995004 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.86 Accuracy on Blind test: 0.93 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.05597258 0.06626916 0.07366967 0.09542322 0.04672813 0.05279183 0.08359885 0.06806421 0.04606438 0.07742214] mean value: 0.06660041809082032 key: score_time value: [0.02560306 0.01884866 0.01223159 0.02448392 0.0155127 0.01884794 0.02686143 0.01222873 0.01251125 0.01234651] mean value: 0.017947578430175783 key: test_mcc value: [0.89227454 0.85164138 0.61538462 0.81312325 0.74466871 0.73568294 0.65433031 0.80829038 0.73568294 0.81312325] mean value: 0.7664202294856968 key: train_mcc value: [0.90647462 0.90621761 0.91955698 0.90233192 0.90233192 0.90667855 0.91502618 0.91071251 0.91492675 0.90220118] mean value: 0.9086458224458123 key: test_accuracy value: [0.94339623 0.9245283 0.80769231 0.90384615 0.86538462 0.86538462 0.82692308 0.90384615 0.86538462 0.90384615] mean value: 0.881023222060958 key: train_accuracy value: [0.95309168 0.95309168 0.95957447 0.95106383 0.95106383 0.95319149 0.95744681 0.95531915 0.95744681 0.95106383] mean value: 0.9542353581635894 key: test_fscore value: [0.93877551 0.92307692 0.80769231 0.90909091 0.87719298 0.85714286 0.83018868 0.90196078 0.87272727 0.90909091] mean value: 0.8826939135040409 key: train_fscore value: [0.95378151 0.95319149 0.96016771 0.95157895 0.95157895 0.95378151 0.95780591 0.95560254 0.95762712 0.95137421] mean value: 0.9546489894196434 key: test_precision value: [1. 0.96 0.80769231 0.86206897 0.80645161 0.91304348 0.81481481 0.92 0.82758621 0.86206897] mean value: 0.8773726351602252 key: train_precision value: [0.94190871 0.94915254 0.94628099 0.94166667 0.94166667 0.94190871 0.94979079 0.94957983 0.9535865 0.94537815] mean value: 0.9460919570890296 key: test_recall value: [0.88461538 0.88888889 0.80769231 0.96153846 0.96153846 0.80769231 0.84615385 0.88461538 0.92307692 0.96153846] mean value: 0.8927350427350428 key: train_recall value: [0.96595745 0.95726496 0.97446809 0.96170213 0.96170213 0.96595745 0.96595745 0.96170213 0.96170213 0.95744681] mean value: 0.9633860701945808 key: test_roc_auc value: [0.94230769 0.92521368 0.80769231 0.90384615 0.86538462 0.86538462 0.82692308 0.90384615 0.86538462 0.90384615] mean value: 0.8809829059829061 key: train_roc_auc value: [0.95306419 0.95310056 0.95957447 0.95106383 0.95106383 0.95319149 0.95744681 0.95531915 0.95744681 0.95106383] mean value: 0.9542334969994545 key: test_jcc value: [0.88461538 0.85714286 0.67741935 0.83333333 0.78125 0.75 0.70967742 0.82142857 0.77419355 0.83333333] mean value: 0.7922393802434124 key: train_jcc value: [0.91164659 0.91056911 0.9233871 0.90763052 0.90763052 0.91164659 0.91902834 0.91497976 0.91869919 0.90725806] mean value: 0.9132475768006711 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.0344193 0.01032948 0.01007748 0.0096755 0.00967216 0.00974727 0.0099268 0.01015282 0.01100254 0.01081896] mean value: 0.012582230567932128 key: score_time value: [0.01461482 0.00918388 0.00888848 0.00873566 0.00871611 0.00898671 0.00880671 0.00905466 0.00932407 0.00937629] mean value: 0.009568738937377929 key: test_mcc value: [0.89227454 0.57616505 0.54494926 0.80829038 0.70064905 0.71151247 0.65824263 0.73568294 0.70064905 0.65824263] mean value: 0.6986657991505085 key: train_mcc value: [0.71462102 0.66795337 0.76214388 0.66895783 0.74910575 0.70654292 0.70690158 0.71128258 0.73659716 0.68550371] mean value: 0.7109609798884714 key: test_accuracy value: [0.94339623 0.77358491 0.76923077 0.90384615 0.84615385 0.84615385 0.82692308 0.86538462 0.84615385 0.82692308] mean value: 0.8447750362844703 key: train_accuracy value: [0.85714286 0.8336887 0.88085106 0.83404255 0.87446809 0.85319149 0.85319149 0.85531915 0.86808511 0.84255319] mean value: 0.8552533684162773 key: test_fscore value: [0.93877551 0.73913043 0.78571429 0.90566038 0.85714286 0.82608696 0.81632653 0.85714286 0.85714286 0.81632653] mean value: 0.8399449197234267 key: train_fscore value: [0.85529158 0.82969432 0.87878788 0.82969432 0.87311828 0.8516129 0.85032538 0.85217391 0.86580087 0.83982684] mean value: 0.8526326282826382 key: test_precision value: [1. 0.89473684 0.73333333 0.88888889 0.8 0.95 0.86956522 0.91304348 0.8 0.86956522] mean value: 0.8719132977370964 key: train_precision value: [0.86842105 0.84821429 0.89427313 0.85201794 0.8826087 0.86086957 0.86725664 0.87111111 0.88105727 0.85462555] mean value: 0.8680455231850978 key: test_recall value: [0.88461538 0.62962963 0.84615385 0.92307692 0.92307692 0.73076923 0.76923077 0.80769231 0.92307692 0.76923077] mean value: 0.8206552706552707 key: train_recall value: [0.84255319 0.81196581 0.86382979 0.80851064 0.86382979 0.84255319 0.83404255 0.83404255 0.85106383 0.82553191] mean value: 0.8377923258774322 key: test_roc_auc value: [0.94230769 0.77635328 0.76923077 0.90384615 0.84615385 0.84615385 0.82692308 0.86538462 0.84615385 0.82692308] mean value: 0.8449430199430199 key: train_roc_auc value: [0.85717403 0.83364248 0.88085106 0.83404255 0.87446809 0.85319149 0.85319149 0.85531915 0.86808511 0.84255319] mean value: 0.8552518639752682 key: test_jcc value: [0.88461538 0.5862069 0.64705882 0.82758621 0.75 0.7037037 0.68965517 0.75 0.75 0.68965517] mean value: 0.7278481360124363 key: train_jcc value: [0.74716981 0.70895522 0.78378378 0.70895522 0.77480916 0.74157303 0.73962264 0.74242424 0.76335878 0.7238806 ] mean value: 0.7434532496453498 MCC on Blind test: 0.76 Accuracy on Blind test: 0.88 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01762056 0.01975036 0.01909256 0.01915002 0.0176785 0.02011013 0.02195501 0.01924419 0.02043986 0.01871943] mean value: 0.019376063346862794 key: score_time value: [0.01073146 0.01117897 0.01201153 0.0118525 0.01197219 0.01188111 0.01207972 0.01185107 0.01192999 0.01186323] mean value: 0.011735177040100098 key: test_mcc value: [0.81688878 0.18759297 0.65433031 0.6789146 0.88527041 0.76923077 0.80829038 0.74466871 0.72760688 0.80829038] mean value: 0.7081084179489021 key: train_mcc value: [0.86416967 0.43722856 0.90213583 0.72315664 0.83806613 0.83960257 0.90233192 0.76845352 0.80635665 0.85958225] mean value: 0.7941083731200913 key: test_accuracy value: [0.90566038 0.54716981 0.82692308 0.82692308 0.94230769 0.88461538 0.90384615 0.86538462 0.84615385 0.90384615] mean value: 0.8452830188679246 key: train_accuracy value: [0.92963753 0.66098081 0.95106383 0.84468085 0.91702128 0.91702128 0.95106383 0.87234043 0.89787234 0.92978723] mean value: 0.887146940071678 key: test_fscore value: [0.90909091 0.25 0.82352941 0.8 0.94339623 0.88461538 0.90196078 0.85106383 0.86666667 0.90566038] mean value: 0.813598359001221 key: train_fscore value: [0.93333333 0.48543689 0.95116773 0.81704261 0.91275168 0.92152918 0.95157895 0.85436893 0.90551181 0.92993631] mean value: 0.8662657410357313 key: test_precision value: [0.86206897 0.8 0.84 0.94736842 0.92592593 0.88461538 0.92 0.95238095 0.76470588 0.88888889] mean value: 0.8785954420733966 key: train_precision value: [0.88846154 1. 0.94915254 0.99390244 0.96226415 0.8740458 0.94166667 0.99435028 0.84249084 0.9279661 ] mean value: 0.9374300365667224 key: test_recall value: [0.96153846 0.14814815 0.80769231 0.69230769 0.96153846 0.88461538 0.88461538 0.76923077 1. 0.92307692] mean value: 0.8032763532763533 key: train_recall value: [0.98297872 0.32051282 0.95319149 0.69361702 0.86808511 0.97446809 0.96170213 0.74893617 0.9787234 0.93191489] mean value: 0.8414129841789416 key: test_roc_auc value: [0.90669516 0.5548433 0.82692308 0.82692308 0.94230769 0.88461538 0.90384615 0.86538462 0.84615385 0.90384615] mean value: 0.8461538461538461 key: train_roc_auc value: [0.92952355 0.66025641 0.95106383 0.84468085 0.91702128 0.91702128 0.95106383 0.87234043 0.89787234 0.92978723] mean value: 0.8870631023822513 key: test_jcc value: [0.83333333 0.14285714 0.7 0.66666667 0.89285714 0.79310345 0.82142857 0.74074074 0.76470588 0.82758621] mean value: 0.7183279135408953 key: train_jcc value: [0.875 0.32051282 0.90688259 0.69067797 0.83950617 0.85447761 0.90763052 0.74576271 0.82733813 0.86904762] mean value: 0.7836836144984219 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01870584 0.01970434 0.01880717 0.01866412 0.02053785 0.02331161 0.02050233 0.0209446 0.02116704 0.02013636] mean value: 0.02024812698364258 key: score_time value: [0.01103997 0.01269507 0.0119617 0.01208353 0.01191449 0.01273751 0.01195216 0.0127418 0.01198077 0.01185441] mean value: 0.012096142768859864 key: test_mcc value: [0.75007832 0.59347897 0.65824263 0.66666667 0.9258201 0.80829038 0.80829038 0.84866842 0.74466871 0.75878691] mean value: 0.7562991488419109 key: train_mcc value: [0.82318874 0.79500161 0.85288412 0.78776807 0.89946992 0.88344643 0.89198214 0.86302723 0.90351119 0.77446957] mean value: 0.8474749027324484 key: test_accuracy value: [0.86792453 0.77358491 0.82692308 0.80769231 0.96153846 0.90384615 0.90384615 0.92307692 0.86538462 0.86538462] mean value: 0.8699201741654572 key: train_accuracy value: [0.90618337 0.8891258 0.92340426 0.88723404 0.94893617 0.94042553 0.94468085 0.92978723 0.95106383 0.8787234 ] mean value: 0.9199564487592433 key: test_fscore value: [0.87719298 0.72727273 0.81632653 0.83870968 0.96296296 0.90196078 0.90196078 0.92 0.87719298 0.88135593] mean value: 0.8704935364010411 key: train_fscore value: [0.91338583 0.87619048 0.91855204 0.89668616 0.94736842 0.94262295 0.94672131 0.92650334 0.95238095 0.89017341] mean value: 0.9210584885895807 key: test_precision value: [0.80645161 0.94117647 0.86956522 0.72222222 0.92857143 0.92 0.92 0.95833333 0.80645161 0.78787879] mean value: 0.8660650685791763 key: train_precision value: [0.84981685 0.98924731 0.98067633 0.82733813 0.97737557 0.90909091 0.91304348 0.97196262 0.92741935 0.81338028] mean value: 0.9159350825957544 key: test_recall value: [0.96153846 0.59259259 0.76923077 1. 1. 0.88461538 0.88461538 0.88461538 0.96153846 1. ] mean value: 0.8938746438746439 key: train_recall value: [0.98723404 0.78632479 0.86382979 0.9787234 0.91914894 0.9787234 0.98297872 0.88510638 0.9787234 0.98297872] mean value: 0.9343771594835424 key: test_roc_auc value: [0.86965812 0.77706553 0.82692308 0.80769231 0.96153846 0.90384615 0.90384615 0.92307692 0.86538462 0.86538462] mean value: 0.8704415954415955 key: train_roc_auc value: [0.90601018 0.88890707 0.92340426 0.88723404 0.94893617 0.94042553 0.94468085 0.92978723 0.95106383 0.8787234 ] mean value: 0.9199172576832151 key: test_jcc value: [0.78125 0.57142857 0.68965517 0.72222222 0.92857143 0.82142857 0.82142857 0.85185185 0.78125 0.78787879] mean value: 0.7756965177223798 key: train_jcc value: [0.84057971 0.77966102 0.84937238 0.81272085 0.9 0.89147287 0.89883268 0.86307054 0.90909091 0.80208333] mean value: 0.8546884294973143 MCC on Blind test: 0.82 Accuracy on Blind test: 0.9 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.18638968 0.18391204 0.18381119 0.18072701 0.18269753 0.18260503 0.18581295 0.18405628 0.18478727 0.18364573] mean value: 0.1838444709777832 key: score_time value: [0.0155859 0.01653624 0.01634264 0.01569414 0.01681447 0.01641345 0.01549268 0.01697898 0.01570892 0.01532364] mean value: 0.016089105606079103 key: test_mcc value: [0.88730475 0.96296296 0.84866842 0.9258201 0.96225045 0.96225045 0.81312325 0.92307692 0.96225045 1. ] mean value: 0.9247707758320183 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94339623 0.98113208 0.92307692 0.96153846 0.98076923 0.98076923 0.90384615 0.96153846 0.98076923 1. ] mean value: 0.9616835994194485 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94117647 0.98113208 0.92592593 0.96296296 0.98113208 0.98039216 0.89795918 0.96153846 0.98113208 1. ] mean value: 0.9613351387966895 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96 1. 0.89285714 0.92857143 0.96296296 1. 0.95652174 0.96153846 0.96296296 1. ] mean value: 0.9625414698023393 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.92307692 0.96296296 0.96153846 1. 1. 0.96153846 0.84615385 0.96153846 1. 1. ] mean value: 0.9616809116809117 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94301994 0.98148148 0.92307692 0.96153846 0.98076923 0.98076923 0.90384615 0.96153846 0.98076923 1. ] mean value: 0.9616809116809117 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88888889 0.96296296 0.86206897 0.92857143 0.96296296 0.96153846 0.81481481 0.92592593 0.96296296 1. ] mean value: 0.927069737414565 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.95 Accuracy on Blind test: 0.98 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.0607388 0.07653952 0.08514023 0.07402754 0.08832049 0.07226753 0.07678175 0.06374931 0.08635044 0.07893109] mean value: 0.07628467082977294 key: score_time value: [0.01762295 0.02642894 0.02865911 0.04618526 0.03816152 0.0224731 0.02270222 0.03807664 0.03926802 0.03778052] mean value: 0.03173582553863526 key: test_mcc value: [0.92450142 0.96296296 0.84866842 0.9258201 0.96225045 0.96225045 0.88527041 0.88527041 0.84615385 1. ] mean value: 0.9203148480995895 key: train_mcc value: [0.99150739 0.98721563 0.98301432 0.99152527 0.9957537 0.98724298 0.9957537 1. 0.9873145 0.99152527] mean value: 0.991085275446824 key: test_accuracy value: [0.96226415 0.98113208 0.92307692 0.96153846 0.98076923 0.98076923 0.94230769 0.94230769 0.92307692 1. ] mean value: 0.9597242380261248 key: train_accuracy value: [0.99573561 0.99360341 0.99148936 0.99574468 0.99787234 0.99361702 0.99787234 1. 0.99361702 0.99574468] mean value: 0.9955296465998276 key: test_fscore value: [0.96153846 0.98113208 0.92592593 0.96296296 0.98113208 0.98039216 0.94117647 0.94117647 0.92307692 1. ] mean value: 0.9598513522486886 key: train_fscore value: [0.9957265 0.99357602 0.99145299 0.9957265 0.9978678 0.99363057 0.99787686 1. 0.99357602 0.9957265 ] mean value: 0.9955159747729551 key: test_precision value: [0.96153846 1. 0.89285714 0.92857143 0.96296296 1. 0.96 0.96 0.92307692 1. ] mean value: 0.9589006919006919 key: train_precision value: [1. 0.99570815 0.99570815 1. 1. 0.99152542 0.99576271 1. 1. 1. ] mean value: 0.9978704444606096 key: test_recall value: [0.96153846 0.96296296 0.96153846 1. 1. 0.96153846 0.92307692 0.92307692 0.92307692 1. ] mean value: 0.9616809116809117 key: train_recall value: [0.99148936 0.99145299 0.98723404 0.99148936 0.99574468 0.99574468 1. 1. 0.98723404 0.99148936] mean value: 0.9931878523367885 key: test_roc_auc value: [0.96225071 0.98148148 0.92307692 0.96153846 0.98076923 0.98076923 0.94230769 0.94230769 0.92307692 1. ] mean value: 0.9597578347578348 key: train_roc_auc value: [0.99574468 0.99359884 0.99148936 0.99574468 0.99787234 0.99361702 0.99787234 1. 0.99361702 0.99574468] mean value: 0.9955300963811602 key: test_jcc value: [0.92592593 0.96296296 0.86206897 0.92857143 0.96296296 0.96153846 0.88888889 0.88888889 0.85714286 1. ] mean value: 0.9238951342399618 key: train_jcc value: [0.99148936 0.98723404 0.98305085 0.99148936 0.99574468 0.98734177 0.99576271 1. 0.98723404 0.99148936] mean value: 0.9910836182537762 MCC on Blind test: 0.9 Accuracy on Blind test: 0.95 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.14152217 0.1800046 0.18905711 0.15365958 0.15383959 0.17124987 0.15433812 0.15641809 0.15753102 0.1591301 ] mean value: 0.16167502403259276 key: score_time value: [0.02448511 0.02515721 0.02925134 0.0243063 0.02417588 0.02826023 0.02418399 0.02447724 0.02408624 0.02411389] mean value: 0.025249743461608888 key: test_mcc value: [0.82552431 0.66524218 0.3086067 0.65433031 0.73568294 0.70064905 0.76923077 0.69230769 0.76923077 0.54006172] mean value: 0.6660866435725556 key: train_mcc value: [0.99150739 0.99150708 0.9873145 0.98312115 0.9873145 0.99152527 0.9873145 0.9873145 0.9873145 0.9873145 ] mean value: 0.9881547880923972 key: test_accuracy value: [0.90566038 0.83018868 0.65384615 0.82692308 0.86538462 0.84615385 0.88461538 0.84615385 0.88461538 0.76923077] mean value: 0.831277213352685 key: train_accuracy value: [0.99573561 0.99573561 0.99361702 0.99148936 0.99361702 0.99574468 0.99361702 0.99361702 0.99361702 0.99361702] mean value: 0.9940407385564578 key: test_fscore value: [0.89361702 0.82352941 0.66666667 0.83018868 0.87272727 0.83333333 0.88461538 0.84615385 0.88461538 0.76 ] mean value: 0.8295447000398473 key: train_fscore value: [0.9957265 0.99570815 0.99357602 0.99141631 0.99357602 0.9957265 0.99357602 0.99357602 0.99357602 0.99357602] mean value: 0.9940033557756031 key: test_precision value: [1. 0.875 0.64285714 0.81481481 0.82758621 0.90909091 0.88461538 0.84615385 0.88461538 0.79166667] mean value: 0.84764003557107 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.80769231 0.77777778 0.69230769 0.84615385 0.92307692 0.76923077 0.88461538 0.84615385 0.88461538 0.73076923] mean value: 0.8162393162393162 key: train_recall value: [0.99148936 0.99145299 0.98723404 0.98297872 0.98723404 0.99148936 0.98723404 0.98723404 0.98723404 0.98723404] mean value: 0.9880814693580651 key: test_roc_auc value: [0.90384615 0.83119658 0.65384615 0.82692308 0.86538462 0.84615385 0.88461538 0.84615385 0.88461538 0.76923077] mean value: 0.8311965811965811 key: train_roc_auc value: [0.99574468 0.9957265 0.99361702 0.99148936 0.99361702 0.99574468 0.99361702 0.99361702 0.99361702 0.99361702] mean value: 0.9940407346790325 key: test_jcc value: [0.80769231 0.7 0.5 0.70967742 0.77419355 0.71428571 0.79310345 0.73333333 0.79310345 0.61290323] mean value: 0.7138292445411467 key: train_jcc value: [0.99148936 0.99145299 0.98723404 0.98297872 0.98723404 0.99148936 0.98723404 0.98723404 0.98723404 0.98723404] mean value: 0.9880814693580651 MCC on Blind test: 0.58 Accuracy on Blind test: 0.79 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.73881483 0.72350812 0.73274493 0.72523093 0.73023582 0.74169731 0.7289629 0.72415876 0.72671056 0.72561407] mean value: 0.7297678232192993 key: score_time value: [0.00964332 0.00934935 0.00935745 0.00994277 0.00966001 0.00946784 0.00924683 0.00931072 0.00933957 0.00921845] mean value: 0.009453630447387696 key: test_mcc value: [0.85164138 1. 0.81312325 0.9258201 0.96225045 0.96225045 0.88527041 0.92307692 0.9258201 1. ] mean value: 0.9249253060070406 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9245283 1. 0.90384615 0.96153846 0.98076923 0.98076923 0.94230769 0.96153846 0.96153846 1. ] mean value: 0.9616835994194485 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92592593 1. 0.90909091 0.96296296 0.98113208 0.98039216 0.94117647 0.96153846 0.96296296 1. ] mean value: 0.9625181925403901 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.89285714 1. 0.86206897 0.92857143 0.96296296 1. 0.96 0.96153846 0.92857143 1. ] mean value: 0.9496570390018666 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96153846 1. 0.96153846 1. 1. 0.96153846 0.92307692 0.96153846 1. 1. ] mean value: 0.9769230769230769 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.92521368 1. 0.90384615 0.96153846 0.98076923 0.98076923 0.94230769 0.96153846 0.96153846 1. ] mean value: 0.9617521367521368 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.86206897 1. 0.83333333 0.92857143 0.96296296 0.96153846 0.88888889 0.92592593 0.92857143 1. ] mean value: 0.9291861395309671 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.86 Accuracy on Blind test: 0.93 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03048921 0.02674842 0.03247237 0.03025341 0.03184032 0.04581976 0.09403157 0.06438708 0.03938842 0.05673742] mean value: 0.04521679878234863 key: score_time value: [0.01284647 0.01716471 0.02299953 0.01456118 0.03505945 0.02486992 0.02550769 0.02261448 0.01530409 0.01726437] mean value: 0.02081918716430664 key: test_mcc value: [0.29676375 0.3960114 0.50951017 0.34684399 0.45095603 0.6172134 0.27386128 0.36896403 0.4233902 0.13323468] mean value: 0.3816748916887629 key: train_mcc value: [0.80521616 0.96592046 0.96609741 0.68800744 0.84577093 0.97880317 0.74239822 0.59537119 0.89871703 0.63481105] mean value: 0.8121113061220316 key: test_accuracy value: [0.64150943 0.69811321 0.75 0.65384615 0.71153846 0.80769231 0.61538462 0.67307692 0.71153846 0.55769231] mean value: 0.6820391872278665 key: train_accuracy value: [0.89339019 0.98294243 0.98297872 0.8212766 0.91702128 0.9893617 0.85531915 0.76170213 0.94680851 0.78723404] mean value: 0.8938034750260854 key: test_fscore value: [0.6779661 0.7037037 0.77192982 0.71875 0.75409836 0.8 0.6969697 0.72131148 0.71698113 0.64615385] mean value: 0.7207864141224611 key: train_fscore value: [0.90384615 0.98297872 0.98312236 0.84837545 0.92337917 0.98942918 0.87360595 0.80756014 0.94382022 0.8245614 ] mean value: 0.9080678755351793 key: test_precision value: [0.60606061 0.7037037 0.70967742 0.60526316 0.65714286 0.83333333 0.575 0.62857143 0.7037037 0.53846154] mean value: 0.6560917748226747 key: train_precision value: [0.8245614 0.97881356 0.9748954 0.73667712 0.85766423 0.98319328 0.77557756 0.67723343 1. 0.70149254] mean value: 0.8510108511659394 key: test_recall value: [0.76923077 0.7037037 0.84615385 0.88461538 0.88461538 0.76923077 0.88461538 0.84615385 0.73076923 0.80769231] mean value: 0.8126780626780626 key: train_recall value: [1. 0.98717949 0.99148936 1. 1. 0.99574468 1. 1. 0.89361702 1. ] mean value: 0.9868030551009275 key: test_roc_auc value: [0.64387464 0.6980057 0.75 0.65384615 0.71153846 0.80769231 0.61538462 0.67307692 0.71153846 0.55769231] mean value: 0.6822649572649573 key: train_roc_auc value: [0.89316239 0.98295145 0.98297872 0.8212766 0.91702128 0.9893617 0.85531915 0.76170213 0.94680851 0.78723404] mean value: 0.8937815966539371 key: test_jcc value: [0.51282051 0.54285714 0.62857143 0.56097561 0.60526316 0.66666667 0.53488372 0.56410256 0.55882353 0.47727273] mean value: 0.5652237060283873 key: train_jcc value: [0.8245614 0.9665272 0.96680498 0.73667712 0.85766423 0.9790795 0.77557756 0.67723343 0.89361702 0.70149254] mean value: 0.8379234972627273 MCC on Blind test: 0.41 Accuracy on Blind test: 0.67 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02924728 0.04173183 0.03825188 0.03950167 0.05261588 0.02752686 0.03700113 0.03158307 0.03224421 0.03502917] mean value: 0.03647329807281494 key: score_time value: [0.02261496 0.01868033 0.01876545 0.0188148 0.02677727 0.0213027 0.01892281 0.01888084 0.0189209 0.01876068] mean value: 0.020244073867797852 key: test_mcc value: [0.92704716 0.88746439 0.61538462 0.88527041 0.79056942 0.73568294 0.80829038 0.84615385 0.74466871 0.84866842] mean value: 0.8089200292799028 key: train_mcc value: [0.86799458 0.8681985 0.85559807 0.85559807 0.85113319 0.86433077 0.88136192 0.85559807 0.85958225 0.85113319] mean value: 0.8610528586884763 key: test_accuracy value: [0.96226415 0.94339623 0.80769231 0.94230769 0.88461538 0.86538462 0.90384615 0.92307692 0.86538462 0.92307692] mean value: 0.9021044992743106 key: train_accuracy value: [0.93390192 0.93390192 0.92765957 0.92765957 0.92553191 0.93191489 0.94042553 0.92765957 0.92978723 0.92553191] mean value: 0.9303974050719049 key: test_fscore value: [0.96 0.94339623 0.80769231 0.94339623 0.89655172 0.85714286 0.90196078 0.92307692 0.87719298 0.92592593] mean value: 0.9036335957575999 key: train_fscore value: [0.93473684 0.93473684 0.92857143 0.92857143 0.92600423 0.93305439 0.94142259 0.92857143 0.92993631 0.92600423] mean value: 0.9311609719764614 key: test_precision value: [1. 0.96153846 0.80769231 0.92592593 0.8125 0.91304348 0.92 0.92307692 0.80645161 0.89285714] mean value: 0.8963085852254856 key: train_precision value: [0.925 0.92116183 0.91701245 0.91701245 0.92016807 0.91769547 0.92592593 0.91701245 0.9279661 0.92016807] mean value: 0.9209122805450133 key: test_recall value: [0.92307692 0.92592593 0.80769231 0.96153846 1. 0.80769231 0.88461538 0.92307692 0.96153846 0.96153846] mean value: 0.9156695156695157 key: train_recall value: [0.94468085 0.94871795 0.94042553 0.94042553 0.93191489 0.94893617 0.95744681 0.94042553 0.93191489 0.93191489] mean value: 0.9416803055100927 key: test_roc_auc value: [0.96153846 0.94373219 0.80769231 0.94230769 0.88461538 0.86538462 0.90384615 0.92307692 0.86538462 0.92307692] mean value: 0.9020655270655271 key: train_roc_auc value: [0.93387889 0.93393344 0.92765957 0.92765957 0.92553191 0.93191489 0.94042553 0.92765957 0.92978723 0.92553191] mean value: 0.9303982542280415 key: test_jcc value: [0.92307692 0.89285714 0.67741935 0.89285714 0.8125 0.75 0.82142857 0.85714286 0.78125 0.86206897] mean value: 0.8270600957718588 key: train_jcc value: [0.87747036 0.87747036 0.86666667 0.86666667 0.86220472 0.8745098 0.88932806 0.86666667 0.86904762 0.86220472] mean value: 0.8712235646491643 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.2604568 0.26721501 0.30715322 0.33151817 0.27433705 0.28127027 0.27576518 0.27336073 0.27455401 0.30313301] mean value: 0.28487634658813477 key: score_time value: [0.02250957 0.01868081 0.02003098 0.01883531 0.01878572 0.01876616 0.01875806 0.01878333 0.01887512 0.01968741] mean value: 0.01937124729156494 key: test_mcc value: [0.92704716 0.88746439 0.61538462 0.88527041 0.79056942 0.73568294 0.80829038 0.84615385 0.74466871 0.84866842] mean value: 0.8089200292799028 key: train_mcc value: [0.86799458 0.8681985 0.85559807 0.85559807 0.85113319 0.86433077 0.88136192 0.85559807 0.85958225 0.85113319] mean value: 0.8610528586884763 key: test_accuracy value: [0.96226415 0.94339623 0.80769231 0.94230769 0.88461538 0.86538462 0.90384615 0.92307692 0.86538462 0.92307692] mean value: 0.9021044992743106 key: train_accuracy value: [0.93390192 0.93390192 0.92765957 0.92765957 0.92553191 0.93191489 0.94042553 0.92765957 0.92978723 0.92553191] mean value: 0.9303974050719049 key: test_fscore value: [0.96 0.94339623 0.80769231 0.94339623 0.89655172 0.85714286 0.90196078 0.92307692 0.87719298 0.92592593] mean value: 0.9036335957575999 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:188: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_sl.py:191: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.93473684 0.93473684 0.92857143 0.92857143 0.92600423 0.93305439 0.94142259 0.92857143 0.92993631 0.92600423] mean value: 0.9311609719764614 key: test_precision value: [1. 0.96153846 0.80769231 0.92592593 0.8125 0.91304348 0.92 0.92307692 0.80645161 0.89285714] mean value: 0.8963085852254856 key: train_precision value: [0.925 0.92116183 0.91701245 0.91701245 0.92016807 0.91769547 0.92592593 0.91701245 0.9279661 0.92016807] mean value: 0.9209122805450133 key: test_recall value: [0.92307692 0.92592593 0.80769231 0.96153846 1. 0.80769231 0.88461538 0.92307692 0.96153846 0.96153846] mean value: 0.9156695156695157 key: train_recall value: [0.94468085 0.94871795 0.94042553 0.94042553 0.93191489 0.94893617 0.95744681 0.94042553 0.93191489 0.93191489] mean value: 0.9416803055100927 key: test_roc_auc value: [0.96153846 0.94373219 0.80769231 0.94230769 0.88461538 0.86538462 0.90384615 0.92307692 0.86538462 0.92307692] mean value: 0.9020655270655271 key: train_roc_auc value: [0.93387889 0.93393344 0.92765957 0.92765957 0.92553191 0.93191489 0.94042553 0.92765957 0.92978723 0.92553191] mean value: 0.9303982542280415 key: test_jcc value: [0.92307692 0.89285714 0.67741935 0.89285714 0.8125 0.75 0.82142857 0.85714286 0.78125 0.86206897] mean value: 0.8270600957718588 key: train_jcc value: [0.87747036 0.87747036 0.86666667 0.86666667 0.86220472 0.8745098 0.88932806 0.86666667 0.86904762 0.86220472] mean value: 0.8712235646491643 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88