/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_8020.py:549: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 1133 PASS: my_features_df and aa_df successfully combined nrows: 1133 ncols: 274 count of NULL values before imputation or_mychisq 339 log10_or_mychisq 339 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML Total no. of features for aaindex: 123 No. of numerical features: 169 No. of categorical features: 7 PASS: x_features has no target variable No. of columns for x_features: 176 ------------------------------------------------------------- Successfully split data with stratification: 80/20 Train data size: (445, 176) Test data size: (112, 176) y_train numbers: Counter({0: 225, 1: 220}) y_train ratio: 1.0227272727272727 y_test_numbers: Counter({0: 57, 1: 55}) y_test ratio: 1.0363636363636364 ------------------------------------------------------------- Simple Random OverSampling Counter({1: 225, 0: 225}) (450, 176) Simple Random UnderSampling Counter({0: 220, 1: 220}) (440, 176) Simple Combined Over and UnderSampling Counter({0: 225, 1: 225}) (450, 176) SMOTE_NC OverSampling Counter({1: 225, 0: 225}) (450, 176) ##################################################################### Running ML analysis: 80/20 split Gene name: rpoB Drug name: rifampicin Output directory: /home/tanu/git/Data/rifampicin/output/ml/tts_8020/ Sanity checks: ML source data size: (557, 176) Total input features: (445, 176) Target feature numbers: Counter({0: 225, 1: 220}) Target features ratio: 1.0227272727272727 ##################################################################### ================================================================ Strucutral features (n): 37 These are: Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', 'interface_dist'] FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss'] Other struc columns: ['rsa', 'kd_values', 'rd_values'] ================================================================ AAindex features (n): 123 ================================================================ Evolutionary features (n): 3 These are: ['consurf_score', 'snap2_score', 'provean_score'] ================================================================ Genomic features (n): 6 These are: ['maf', 'logorI'] ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'] ================================================================ Categorical features (n): 7 These are: ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'] ================================================================ Pass: No. of features match ##################################################################### Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.08822989 0.09819555 0.1139729 0.10658669 0.11813283 0.05520678 0.07728314 0.09269142 0.11131549 0.06736732] mean value: 0.09289820194244384 key: score_time value: [0.01899791 0.02099395 0.02197051 0.02467132 0.05310249 0.02295399 0.02127385 0.02181339 0.0188055 0.01463914] mean value: 0.023922204971313477 key: test_mcc value: [0.82506438 0.86732843 0.68911026 0.8360602 0.86758893 0.86452993 0.86452993 0.77352678 0.77352678 0.77352678] mean value: 0.8134792424092705 key: train_mcc value: [0.860043 0.85528899 0.8500425 0.869987 0.85018502 0.86053339 0.85041172 0.85535874 0.8705095 0.87541359] mean value: 0.8597773460103459 key: test_accuracy value: [0.91111111 0.93333333 0.84444444 0.91111111 0.93333333 0.93181818 0.93181818 0.88636364 0.88636364 0.88636364] mean value: 0.9056060606060606 key: train_accuracy value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.93 0.9275 0.925 0.935 0.925 0.93017456 0.92518703 0.9276808 0.93516209 0.93765586] mean value: 0.9298360349127183 key: test_fscore value: [0.9047619 0.93023256 0.8372093 0.91666667 0.93333333 0.93023256 0.93333333 0.88888889 0.88888889 0.88888889] mean value: 0.9052436323366556 key: train_fscore value: [0.92964824 0.9276808 0.92462312 0.93434343 0.925 0.93 0.92462312 0.92695214 0.935 0.93734336] mean value: 0.9295214204164155 key: test_precision value: [0.95 0.95238095 0.85714286 0.84615385 0.91304348 0.95238095 0.91304348 0.86956522 0.86956522 0.86956522] mean value: 0.8992841216754259 key: train_precision value: [0.925 0.91625616 0.92 0.93434343 0.91584158 0.92079208 0.92 0.92462312 0.92574257 0.93034826] mean value: 0.9232947203887022 key: test_recall value: [0.86363636 0.90909091 0.81818182 1. 0.95454545 0.90909091 0.95454545 0.90909091 0.90909091 0.90909091] mean value: 0.9136363636363636 key: train_recall value: [0.93434343 0.93939394 0.92929293 0.93434343 0.93434343 0.93939394 0.92929293 0.92929293 0.94444444 0.94444444] mean value: 0.9358585858585858 key: test_roc_auc value: [0.91007905 0.93280632 0.84387352 0.91304348 0.93379447 0.93181818 0.93181818 0.88636364 0.88636364 0.88636364] mean value: 0.9056324110671937 key: train_roc_auc value: [0.930043 0.92761776 0.9250425 0.9349935 0.92509251 0.9302881 0.9252376 0.92770065 0.93527641 0.93773946] mean value: 0.9299031504135635 key: test_jcc value: [0.82608696 0.86956522 0.72 0.84615385 0.875 0.86956522 0.875 0.8 0.8 0.8 ] mean value: 0.8281371237458194 key: train_jcc value: [0.8685446 0.86511628 0.85981308 0.87677725 0.86046512 0.86915888 0.85981308 0.86384977 0.87793427 0.88207547] mean value: 0.8683547803458409 MCC on Blind test: 0.8 Accuracy on Blind test: 0.9 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [1.07467175 1.19143915 3.18981433 1.21566057 1.01042342 0.9827199 1.26141262 2.24912834 3.37447715 2.48636866] mean value: 1.8036115884780883 key: score_time value: [0.01486397 0.01910448 0.01350474 0.01525831 0.02177715 0.02520943 0.01491761 0.02106428 0.03649759 0.02547741] mean value: 0.02076749801635742 key: test_mcc value: [0.82506438 0.82506438 0.64613475 0.79854941 0.91485328 0.81818182 0.86452993 0.81818182 0.81818182 0.77352678] mean value: 0.8102268367108156 key: train_mcc value: [0.89018902 0.900045 0.83510219 0.89002252 0.88510532 0.89536533 0.90029107 0.89025725 0.89555655 0.90043786] mean value: 0.8882372108052222 key: test_accuracy value: [0.91111111 0.91111111 0.82222222 0.88888889 0.95555556 0.90909091 0.93181818 0.90909091 0.90909091 0.88636364] mean value: 0.9034343434343434 key: train_accuracy value: [0.945 0.95 0.9175 0.945 0.9425 0.94763092 0.95012469 0.94513716 0.94763092 0.95012469] mean value: 0.9440648379052369 key: test_fscore value: [0.9047619 0.9047619 0.80952381 0.89795918 0.95652174 0.90909091 0.93333333 0.90909091 0.90909091 0.88888889] mean value: 0.9023023491346472 key: train_fscore value: [0.945 0.94974874 0.91729323 0.94416244 0.94235589 0.94736842 0.94974874 0.94444444 0.94763092 0.95 ] mean value: 0.943775283498277 key: test_precision value: [0.95 0.95 0.85 0.81481481 0.91666667 0.90909091 0.91304348 0.90909091 0.90909091 0.86956522] mean value: 0.8991362904406383 key: train_precision value: [0.93564356 0.945 0.91044776 0.94897959 0.93532338 0.94029851 0.945 0.94444444 0.93596059 0.94059406] mean value: 0.9381691902917854 key: test_recall value: [0.86363636 0.86363636 0.77272727 1. 1. 0.90909091 0.95454545 0.90909091 0.90909091 0.90909091] mean value: 0.9090909090909091 key: train_recall value: [0.95454545 0.95454545 0.92424242 0.93939394 0.94949495 0.95454545 0.95454545 0.94444444 0.95959596 0.95959596] mean value: 0.9494949494949495 key: test_roc_auc value: [0.91007905 0.91007905 0.82114625 0.89130435 0.95652174 0.90909091 0.93181818 0.90909091 0.90909091 0.88636364] mean value: 0.9034584980237155 key: train_roc_auc value: [0.94509451 0.950045 0.91756676 0.94494449 0.94256926 0.94771608 0.95017913 0.94512863 0.94777828 0.95024133] mean value: 0.9441263461321502 key: test_jcc value: [0.82608696 0.82608696 0.68 0.81481481 0.91666667 0.83333333 0.875 0.83333333 0.83333333 0.8 ] mean value: 0.823865539452496 key: train_jcc value: [0.8957346 0.90430622 0.84722222 0.89423077 0.89099526 0.9 0.90430622 0.89473684 0.90047393 0.9047619 ] mean value: 0.8936767969980741 MCC on Blind test: 0.8 Accuracy on Blind test: 0.9 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.02495861 0.01450539 0.01457644 0.01471186 0.01506543 0.01446843 0.01499057 0.01478457 0.01477289 0.01486278] mean value: 0.015769696235656737 key: score_time value: [0.01284051 0.01264167 0.01298308 0.012573 0.01277876 0.01334071 0.01299834 0.0127492 0.01270795 0.0127039 ] mean value: 0.012831711769104004 key: test_mcc value: [0.56261436 0.66660455 0.74410286 0.51089209 0.60079051 0.72727273 0.66143783 0.68252363 0.43151697 0.68252363] mean value: 0.6270279162046954 key: train_mcc value: [0.68538393 0.67445688 0.70858632 0.68778613 0.64887146 0.64847406 0.66903696 0.68058469 0.68521411 0.6862916 ] mean value: 0.6774686130760773 key: test_accuracy value: [0.77777778 0.82222222 0.86666667 0.75555556 0.8 0.86363636 0.81818182 0.84090909 0.70454545 0.84090909] mean value: 0.809040404040404 key: train_accuracy value: [0.84 0.835 0.8525 0.8425 0.8225 0.82044888 0.83291771 0.83790524 0.840399 0.84289277] mean value: 0.8367063591022443 key: test_fscore value: [0.75 0.78947368 0.85 0.74418605 0.8 0.86363636 0.78947368 0.8372093 0.64864865 0.84444444] mean value: 0.7917072173987718 key: train_fscore value: [0.82702703 0.82258065 0.84266667 0.83289125 0.80965147 0.8021978 0.82133333 0.82479784 0.82795699 0.84367246] mean value: 0.8254775485090063 key: test_precision value: [0.83333333 0.9375 0.94444444 0.76190476 0.7826087 0.86363636 0.9375 0.85714286 0.8 0.82608696] mean value: 0.8544157412635673 key: train_precision value: [0.88953488 0.87931034 0.89265537 0.87709497 0.86285714 0.87951807 0.8700565 0.88439306 0.88505747 0.82926829] mean value: 0.8749746107699744 key: test_recall value: [0.68181818 0.68181818 0.77272727 0.72727273 0.81818182 0.86363636 0.68181818 0.81818182 0.54545455 0.86363636] mean value: 0.7454545454545455 key: train_recall value: [0.77272727 0.77272727 0.7979798 0.79292929 0.76262626 0.73737374 0.77777778 0.77272727 0.77777778 0.85858586] mean value: 0.7823232323232323 key: test_roc_auc value: [0.7756917 0.81916996 0.86462451 0.75494071 0.80039526 0.86363636 0.81818182 0.84090909 0.70454545 0.84090909] mean value: 0.808300395256917 key: train_roc_auc value: [0.83933393 0.83438344 0.8519602 0.8420092 0.82190719 0.81942578 0.83223864 0.83710255 0.83962781 0.84308603] mean value: 0.8361074777428482 key: test_jcc value: [0.6 0.65217391 0.73913043 0.59259259 0.66666667 0.76 0.65217391 0.72 0.48 0.73076923] mean value: 0.6593506750898055 key: train_jcc value: [0.70506912 0.69863014 0.7281106 0.71363636 0.68018018 0.66972477 0.69683258 0.70183486 0.70642202 0.72961373] mean value: 0.7030054368772396 MCC on Blind test: 0.68 Accuracy on Blind test: 0.84 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01549268 0.01472116 0.01465893 0.0147357 0.01448822 0.01551938 0.01875544 0.02495813 0.01451755 0.01468372] mean value: 0.016253089904785155 key: score_time value: [0.01308107 0.01281691 0.01316237 0.01281404 0.01322484 0.01289582 0.02950215 0.01286101 0.01288104 0.01303196] mean value: 0.01462712287902832 key: test_mcc value: [0.68911026 0.77821935 0.64426877 0.60079051 0.70780516 0.63636364 0.73029674 0.63636364 0.5547002 0.77352678] mean value: 0.6751445052594404 key: train_mcc value: [0.73006509 0.73513714 0.7700385 0.74497106 0.74497106 0.71074778 0.7306343 0.75588396 0.69693637 0.75588396] mean value: 0.7375269233005722 key: test_accuracy value: [0.84444444 0.88888889 0.82222222 0.8 0.84444444 0.81818182 0.86363636 0.81818182 0.77272727 0.88636364] mean value: 0.8359090909090909 key: train_accuracy value: [0.865 0.8675 0.885 0.8725 0.8725 0.8553616 0.86533666 0.87780549 0.8478803 0.87780549] mean value: 0.8686689526184539 key: test_fscore value: [0.8372093 0.88372093 0.81818182 0.8 0.85714286 0.81818182 0.85714286 0.81818182 0.75 0.88888889] mean value: 0.8328650290278198 key: train_fscore value: [0.8622449 0.86445013 0.88442211 0.87088608 0.87088608 0.85204082 0.86294416 0.87780549 0.84073107 0.87780549] mean value: 0.866421631011566 key: test_precision value: [0.85714286 0.9047619 0.81818182 0.7826087 0.77777778 0.81818182 0.9 0.81818182 0.83333333 0.86956522] mean value: 0.8379735240604806 key: train_precision value: [0.87113402 0.87564767 0.88 0.87309645 0.87309645 0.86082474 0.86734694 0.86699507 0.87027027 0.86699507] mean value: 0.8705406681510427 key: test_recall value: [0.81818182 0.86363636 0.81818182 0.81818182 0.95454545 0.81818182 0.81818182 0.81818182 0.68181818 0.90909091] mean value: 0.8318181818181818 key: train_recall value: [0.85353535 0.85353535 0.88888889 0.86868687 0.86868687 0.84343434 0.85858586 0.88888889 0.81313131 0.88888889] mean value: 0.8626262626262626 key: test_roc_auc value: [0.84387352 0.88833992 0.82213439 0.80039526 0.84683794 0.81818182 0.86363636 0.81818182 0.77272727 0.88636364] mean value: 0.8360671936758893 key: train_roc_auc value: [0.86488649 0.86736174 0.8850385 0.87246225 0.87246225 0.85521471 0.86525352 0.87794198 0.84745236 0.87794198] mean value: 0.8686015769064591 key: test_jcc value: [0.72 0.79166667 0.69230769 0.66666667 0.75 0.69230769 0.75 0.69230769 0.6 0.8 ] mean value: 0.715525641025641 key: train_jcc value: [0.75784753 0.76126126 0.79279279 0.77130045 0.77130045 0.74222222 0.75892857 0.78222222 0.72522523 0.78222222] mean value: 0.7645322947867791 MCC on Blind test: 0.75 Accuracy on Blind test: 0.88 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01377344 0.0139215 0.01400995 0.02145243 0.01363587 0.01715064 0.01344013 0.01378942 0.03159833 0.03337765] mean value: 0.018614935874938964 key: score_time value: [0.09378958 0.03972101 0.03561568 0.05780077 0.05395031 0.0524869 0.03577709 0.05378652 0.04946804 0.04471517] mean value: 0.051711106300354005 key: test_mcc value: [0.48086334 0.42403053 0.29512214 0.19960474 0.55666994 0.64715023 0.59648091 0.59648091 0.50051733 0.59648091] mean value: 0.4893400981241138 key: train_mcc value: [0.68527843 0.70019536 0.70019536 0.70627441 0.68496131 0.7009041 0.66084236 0.66734561 0.71074778 0.69173625] mean value: 0.690848097079054 key: test_accuracy value: [0.73333333 0.71111111 0.64444444 0.6 0.77777778 0.81818182 0.79545455 0.79545455 0.75 0.79545455] mean value: 0.7421212121212121 key: train_accuracy value: [0.8425 0.85 0.85 0.8525 0.8425 0.85037406 0.83042394 0.83291771 0.8553616 0.84538653] mean value: 0.8451963840399003 key: test_fscore value: [0.68421053 0.68292683 0.57894737 0.59090909 0.76190476 0.83333333 0.7804878 0.7804878 0.74418605 0.80851064] mean value: 0.7245904204717919 key: train_fscore value: [0.83804627 0.84615385 0.84615385 0.845953 0.84050633 0.84615385 0.82653061 0.82414698 0.85204082 0.83854167] mean value: 0.8404227219545394 key: test_precision value: [0.8125 0.73684211 0.6875 0.59090909 0.8 0.76923077 0.84210526 0.84210526 0.76190476 0.76 ] mean value: 0.760309725362357 key: train_precision value: [0.85340314 0.859375 0.859375 0.87567568 0.84263959 0.859375 0.83505155 0.8579235 0.86082474 0.8655914 ] mean value: 0.8569234594722578 key: test_recall value: [0.59090909 0.63636364 0.5 0.59090909 0.72727273 0.90909091 0.72727273 0.72727273 0.72727273 0.86363636] mean value: 0.7 key: train_recall value: [0.82323232 0.83333333 0.83333333 0.81818182 0.83838384 0.83333333 0.81818182 0.79292929 0.84343434 0.81313131] mean value: 0.8247474747474748 key: test_roc_auc value: [0.73023715 0.70948617 0.64130435 0.59980237 0.77667984 0.81818182 0.79545455 0.79545455 0.75 0.79545455] mean value: 0.7412055335968379 key: train_roc_auc value: [0.84230923 0.84983498 0.84983498 0.85216022 0.84245925 0.8501642 0.83027318 0.83242524 0.85521471 0.8449893 ] mean value: 0.8449665286725717 key: test_jcc value: [0.52 0.51851852 0.40740741 0.41935484 0.61538462 0.71428571 0.64 0.64 0.59259259 0.67857143] mean value: 0.5746115115469954 key: train_jcc value: [0.72123894 0.73333333 0.73333333 0.73303167 0.72489083 0.73333333 0.70434783 0.70089286 0.74222222 0.72197309] mean value: 0.7248597441578004 MCC on Blind test: 0.41 Accuracy on Blind test: 0.71 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02649641 0.02651262 0.02658916 0.02641249 0.0266242 0.02697945 0.02693439 0.02807307 0.02710104 0.0273788 ] mean value: 0.026910161972045897 key: score_time value: [0.01609492 0.01617599 0.01605654 0.0158565 0.03611541 0.01599598 0.01581216 0.0163033 0.01630783 0.01622319] mean value: 0.018094182014465332 key: test_mcc value: [0.78405645 0.82506438 0.60000118 0.74605372 0.82574419 0.86452993 0.90909091 0.72727273 0.81818182 0.77352678] mean value: 0.7873522091161106 key: train_mcc value: [0.80528086 0.80528086 0.81500094 0.809981 0.79998 0.79560664 0.79055651 0.80547816 0.80053238 0.81050825] mean value: 0.8038205592014417 key: test_accuracy value: [0.88888889 0.91111111 0.8 0.86666667 0.91111111 0.93181818 0.95454545 0.86363636 0.90909091 0.88636364] mean value: 0.8923232323232323 key: train_accuracy value: [0.9025 0.9025 0.9075 0.905 0.9 0.89775561 0.89526185 0.90274314 0.90024938 0.90523691] mean value: 0.9018746882793017 key: test_fscore value: [0.87804878 0.9047619 0.79069767 0.875 0.91304348 0.93023256 0.95454545 0.86363636 0.90909091 0.88888889] mean value: 0.8907946012230334 key: train_fscore value: [0.90274314 0.90274314 0.90680101 0.9040404 0.8989899 0.89724311 0.89447236 0.90176322 0.89949749 0.90452261] mean value: 0.9012816389138596 key: test_precision value: [0.94736842 0.95 0.80952381 0.80769231 0.875 0.95238095 0.95454545 0.86363636 0.90909091 0.86956522] mean value: 0.8938803435313732 key: train_precision value: [0.89162562 0.89162562 0.90452261 0.9040404 0.8989899 0.89054726 0.89 0.89949749 0.895 0.9 ] mean value: 0.8965848898741502 key: test_recall value: [0.81818182 0.86363636 0.77272727 0.95454545 0.95454545 0.90909091 0.95454545 0.86363636 0.90909091 0.90909091] mean value: 0.8909090909090909 key: train_recall value: [0.91414141 0.91414141 0.90909091 0.9040404 0.8989899 0.9040404 0.8989899 0.9040404 0.9040404 0.90909091] mean value: 0.9060606060606061 key: test_roc_auc value: [0.88735178 0.91007905 0.79940711 0.86857708 0.91205534 0.93181818 0.95454545 0.86363636 0.90909091 0.88636364] mean value: 0.8922924901185771 key: train_roc_auc value: [0.90261526 0.90261526 0.90751575 0.9049905 0.89999 0.89783301 0.89530776 0.90275912 0.90029606 0.90528437] mean value: 0.9019207093123105 key: test_jcc value: [0.7826087 0.82608696 0.65384615 0.77777778 0.84 0.86956522 0.91304348 0.76 0.83333333 0.8 ] mean value: 0.8056261612783352 key: train_jcc value: [0.82272727 0.82272727 0.82949309 0.82488479 0.81651376 0.81363636 0.80909091 0.82110092 0.8173516 0.82568807] mean value: 0.8203214048833244 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [2.87617755 3.59716487 3.2788682 3.04321766 2.02100229 0.76258993 0.71084738 1.37337828 0.62539959 1.43995023] mean value: 1.9728595972061158 key: score_time value: [0.02402663 0.0251689 0.02707911 0.05417156 0.02970004 0.01262379 0.02017164 0.02028584 0.02023172 0.02038431] mean value: 0.025384354591369628 key: test_mcc value: [0.86732843 0.77821935 0.60404349 0.76206649 0.86758893 0.51031036 0.86452993 0.77352678 0.77352678 0.60678804] mean value: 0.7407928602447988 key: train_mcc value: [0.99501219 1. 0.99501219 1. 1. 0.63345212 0.79681808 0.81118415 0.78683326 0.76497588] mean value: 0.8783287861680027 key: test_accuracy value: [0.93333333 0.88888889 0.8 0.86666667 0.93333333 0.72727273 0.93181818 0.88636364 0.88636364 0.79545455] mean value: 0.8649494949494949 key: train_accuracy value: [0.9975 1. 0.9975 1. 1. 0.79301746 0.89775561 0.90523691 0.89276808 0.87531172] mean value: 0.9359089775561098 key: test_fscore value: [0.93023256 0.88372093 0.7804878 0.88 0.93333333 0.77777778 0.93333333 0.88888889 0.88888889 0.81632653] mean value: 0.8712990046084609 key: train_fscore value: [0.99748111 1. 0.99748111 1. 1. 0.82377919 0.8992629 0.90594059 0.89434889 0.88479263] mean value: 0.940308642422994 key: test_precision value: [0.95238095 0.9047619 0.84210526 0.78571429 0.91304348 0.65625 0.91304348 0.86956522 0.86956522 0.74074074] mean value: 0.8447170538060126 key: train_precision value: [0.99497487 1. 0.99497487 1. 1. 0.71062271 0.87559809 0.88834951 0.8708134 0.81355932] mean value: 0.9148892779217023 key: test_recall value: [0.90909091 0.86363636 0.72727273 1. 0.95454545 0.95454545 0.95454545 0.90909091 0.90909091 0.90909091] mean value: 0.9090909090909091 key: train_recall value: [1. 1. 1. 1. 1. 0.97979798 0.92424242 0.92424242 0.91919192 0.96969697] mean value: 0.9717171717171718 key: test_roc_auc value: [0.93280632 0.88833992 0.79841897 0.86956522 0.93379447 0.72727273 0.93181818 0.88636364 0.88636364 0.79545455] mean value: 0.8650197628458498 key: train_roc_auc value: [0.99752475 1. 0.99752475 1. 1. 0.79531771 0.8980818 0.90547097 0.8930935 0.8764741 ] mean value: 0.9363487580285123 key: test_jcc value: [0.86956522 0.79166667 0.64 0.78571429 0.875 0.63636364 0.875 0.8 0.8 0.68965517] mean value: 0.7762964978549687 key: train_jcc value: [0.99497487 1. 0.99497487 1. 1. 0.70036101 0.81696429 0.8280543 0.80888889 0.79338843] mean value: 0.8937606662571818 MCC on Blind test: 0.73 Accuracy on Blind test: 0.87 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.04109359 0.02866888 0.02903247 0.02609515 0.02982569 0.026232 0.02595305 0.02627039 0.02766967 0.02752471] mean value: 0.028836560249328614 key: score_time value: [0.01269889 0.01240039 0.01267838 0.01267815 0.01238513 0.01267934 0.01254892 0.01264668 0.01244855 0.01284599] mean value: 0.012601041793823242 key: test_mcc value: [0.86732843 0.86758893 0.86732843 0.82574419 1. 0.73029674 0.86452993 0.87177979 0.82158384 0.81818182] mean value: 0.8534362113672467 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93333333 0.93333333 0.93333333 0.91111111 1. 0.86363636 0.93181818 0.93181818 0.90909091 0.90909091] mean value: 0.9256565656565656 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93023256 0.93333333 0.93023256 0.91304348 1. 0.86956522 0.93333333 0.93617021 0.9047619 0.90909091] mean value: 0.9259763505216682 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95238095 0.91304348 0.95238095 0.875 1. 0.83333333 0.91304348 0.88 0.95 0.90909091] mean value: 0.9178273103707886 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.95454545 0.90909091 0.95454545 1. 0.90909091 0.95454545 1. 0.86363636 0.90909091] mean value: 0.9363636363636364 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93280632 0.93379447 0.93280632 0.91205534 1. 0.86363636 0.93181818 0.93181818 0.90909091 0.90909091] mean value: 0.9256916996047431 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.86956522 0.875 0.86956522 0.84 1. 0.76923077 0.875 0.88 0.82608696 0.83333333] mean value: 0.863778149386845 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.91 Accuracy on Blind test: 0.96 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.19880676 0.16679668 0.24343967 0.16726899 0.29743075 0.16902733 0.17551208 0.1705997 0.18266988 0.17188907] mean value: 0.19434409141540526 key: score_time value: [0.02430034 0.02431679 0.0246942 0.02460122 0.02689743 0.02486563 0.02493906 0.02493906 0.02535796 0.02534413] mean value: 0.025025582313537596 key: test_mcc value: [0.86732843 0.82506438 0.60000118 0.69583743 0.78530224 0.86452993 0.7800135 0.7800135 0.77352678 0.77352678] mean value: 0.7745144142569061 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93333333 0.91111111 0.8 0.84444444 0.88888889 0.93181818 0.88636364 0.88636364 0.88636364 0.88636364] mean value: 0.8855050505050505 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93023256 0.9047619 0.79069767 0.85106383 0.89361702 0.93333333 0.87804878 0.89361702 0.88372093 0.88888889] mean value: 0.8847981942603055 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95238095 0.95 0.80952381 0.8 0.84 0.91304348 0.94736842 0.84 0.9047619 0.86956522] mean value: 0.8826643783371472 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.86363636 0.77272727 0.90909091 0.95454545 0.95454545 0.81818182 0.95454545 0.86363636 0.90909091] mean value: 0.8909090909090909 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93280632 0.91007905 0.79940711 0.8458498 0.89031621 0.93181818 0.88636364 0.88636364 0.88636364 0.88636364] mean value: 0.8855731225296443 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.86956522 0.82608696 0.65384615 0.74074074 0.80769231 0.875 0.7826087 0.80769231 0.79166667 0.8 ] mean value: 0.7954899046203394 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01551032 0.01521516 0.01527309 0.01511455 0.01513052 0.01546001 0.01533055 0.01527858 0.01553512 0.01524663] mean value: 0.015309453010559082 key: score_time value: [0.01284385 0.01274657 0.01290631 0.01287198 0.01276088 0.01284337 0.01283884 0.01324463 0.01275945 0.01283479] mean value: 0.012865066528320312 key: test_mcc value: [0.55666994 0.38112585 0.68972332 0.19881069 0.46930785 0.77352678 0.32673202 0.45454545 0.54545455 0.50051733] mean value: 0.48964137935205826 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.77777778 0.68888889 0.84444444 0.6 0.73333333 0.88636364 0.65909091 0.72727273 0.77272727 0.75 ] mean value: 0.743989898989899 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.76190476 0.65 0.84444444 0.57142857 0.73913043 0.88372093 0.61538462 0.72727273 0.77272727 0.74418605] mean value: 0.7310199804689188 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8 0.72222222 0.82608696 0.6 0.70833333 0.9047619 0.70588235 0.72727273 0.77272727 0.76190476] mean value: 0.7529191531685138 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 0.59090909 0.86363636 0.54545455 0.77272727 0.86363636 0.54545455 0.72727273 0.77272727 0.72727273] mean value: 0.7136363636363636 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.77667984 0.68675889 0.84486166 0.59881423 0.73418972 0.88636364 0.65909091 0.72727273 0.77272727 0.75 ] mean value: 0.7436758893280633 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.61538462 0.48148148 0.73076923 0.4 0.5862069 0.79166667 0.44444444 0.57142857 0.62962963 0.59259259] mean value: 0.5843604128948956 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.6 Accuracy on Blind test: 0.79 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [2.48509145 1.72775126 1.72003531 1.70660305 1.7010572 1.70190072 1.67853093 1.70248485 1.73082352 1.76075959] mean value: 1.7915037870407104 key: score_time value: [0.10180235 0.10169816 0.09832668 0.10011482 0.10132575 0.09362698 0.09398794 0.09288573 0.10091448 0.09981847] mean value: 0.09845013618469238 key: test_mcc value: [0.95652174 0.91452919 0.91106719 0.86758893 1. 0.90909091 0.95553309 0.87177979 0.82158384 0.81818182] mean value: 0.9025876492117845 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.97777778 0.95555556 0.95555556 0.93333333 1. 0.95454545 0.97727273 0.93181818 0.90909091 0.90909091] mean value: 0.9504040404040404 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.97777778 0.95238095 0.95454545 0.93333333 1. 0.95454545 0.97674419 0.93617021 0.9047619 0.90909091] mean value: 0.9499350185248255 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95652174 1. 0.95454545 0.91304348 1. 0.95454545 1. 0.88 0.95 0.90909091] mean value: 0.9517747035573123 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.90909091 0.95454545 0.95454545 1. 0.95454545 0.95454545 1. 0.86363636 0.90909091] mean value: 0.95 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97826087 0.95454545 0.9555336 0.93379447 1. 0.95454545 0.97727273 0.93181818 0.90909091 0.90909091] mean value: 0.9503952569169961 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.95652174 0.90909091 0.91304348 0.875 1. 0.91304348 0.95454545 0.88 0.82608696 0.83333333] mean value: 0.906066534914361 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.89 Accuracy on Blind test: 0.95 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: fit_time value: [2.09929252 1.04539609 1.14330029 0.98038101 0.95856428 0.94347358 1.00293994 0.99390554 0.91637397 1.01709938] mean value: 1.110072660446167 key: score_time value: [0.2254622 0.14489126 0.16381502 0.14184189 0.19981694 0.18624687 0.16025591 0.16381979 0.13914704 0.18514252] mean value: 0.1710439443588257 key: test_mcc value: [0.91106719 0.91452919 0.86732843 0.73663511 0.91485328 0.90909091 0.91287093 0.87177979 0.82158384 0.81818182] mean value: 0.8677920483015767 key: train_mcc value: [0.95500519 0.949995 0.94500356 0.95017516 0.949995 0.95015803 0.95011693 0.94513697 0.94513697 0.96009355] mean value: 0.9500816369848698 key: test_accuracy value: [0.95555556 0.95555556 0.93333333 0.86666667 0.95555556 0.95454545 0.95454545 0.93181818 0.90909091 0.90909091] mean value: 0.9325757575757576 key: train_accuracy value: [0.9775 0.975 0.9725 0.975 0.975 0.97506234 0.97506234 0.97256858 0.97256858 0.98004988] mean value: 0.9750311720698255 key: test_fscore value: [0.95454545 0.95238095 0.93023256 0.86956522 0.95652174 0.95454545 0.95238095 0.93617021 0.9047619 0.90909091] mean value: 0.9320195355132859 key: train_fscore value: [0.97721519 0.97474747 0.9721519 0.9744898 0.97474747 0.97461929 0.97474747 0.9721519 0.9721519 0.97979798] mean value: 0.9746820375374823 key: test_precision value: [0.95454545 1. 0.95238095 0.83333333 0.91666667 0.95454545 1. 0.88 0.95 0.90909091] mean value: 0.935056277056277 key: train_precision value: [0.97969543 0.97474747 0.97461929 0.98453608 0.97474747 0.97959184 0.97474747 0.97461929 0.97461929 0.97979798] mean value: 0.977172162274171 key: test_recall value: [0.95454545 0.90909091 0.90909091 0.90909091 1. 0.95454545 0.90909091 1. 0.86363636 0.90909091] mean value: 0.9318181818181818 key: train_recall value: [0.97474747 0.97474747 0.96969697 0.96464646 0.97474747 0.96969697 0.97474747 0.96969697 0.96969697 0.97979798] mean value: 0.9722222222222222 key: test_roc_auc value: [0.9555336 0.95454545 0.93280632 0.86758893 0.95652174 0.95454545 0.95454545 0.93181818 0.90909091 0.90909091] mean value: 0.932608695652174 key: train_roc_auc value: [0.97747275 0.9749975 0.97247225 0.97489749 0.9749975 0.97499627 0.97505847 0.97253321 0.97253321 0.98004677] mean value: 0.9750005419261139 key: test_jcc value: [0.91304348 0.90909091 0.86956522 0.76923077 0.91666667 0.91304348 0.90909091 0.88 0.82608696 0.83333333] mean value: 0.873915171784737 key: train_jcc value: [0.95544554 0.95073892 0.94581281 0.95024876 0.95073892 0.95049505 0.95073892 0.94581281 0.94581281 0.96039604] mean value: 0.9506240562296064 MCC on Blind test: 0.93 Accuracy on Blind test: 0.96 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01235938 0.01207018 0.01216507 0.01198816 0.01195812 0.01208854 0.01212764 0.01205707 0.01214314 0.01219463] mean value: 0.012115192413330079 key: score_time value: [0.01045775 0.01042986 0.0105226 0.01041794 0.01037836 0.01043344 0.01043797 0.01044178 0.01040506 0.01040316] mean value: 0.010432791709899903 key: test_mcc value: [0.68911026 0.77821935 0.64426877 0.60079051 0.70780516 0.63636364 0.73029674 0.63636364 0.5547002 0.77352678] mean value: 0.6751445052594404 key: train_mcc value: [0.73006509 0.73513714 0.7700385 0.74497106 0.74497106 0.71074778 0.7306343 0.75588396 0.69693637 0.75588396] mean value: 0.7375269233005722 key: test_accuracy value: [0.84444444 0.88888889 0.82222222 0.8 0.84444444 0.81818182 0.86363636 0.81818182 0.77272727 0.88636364] mean value: 0.8359090909090909 key: train_accuracy value: [0.865 0.8675 0.885 0.8725 0.8725 0.8553616 0.86533666 0.87780549 0.8478803 0.87780549] mean value: 0.8686689526184539 key: test_fscore value: [0.8372093 0.88372093 0.81818182 0.8 0.85714286 0.81818182 0.85714286 0.81818182 0.75 0.88888889] mean value: 0.8328650290278198 key: train_fscore value: [0.8622449 0.86445013 0.88442211 0.87088608 0.87088608 0.85204082 0.86294416 0.87780549 0.84073107 0.87780549] mean value: 0.866421631011566 key: test_precision value: [0.85714286 0.9047619 0.81818182 0.7826087 0.77777778 0.81818182 0.9 0.81818182 0.83333333 0.86956522] mean value: 0.8379735240604806 key: train_precision value: [0.87113402 0.87564767 0.88 0.87309645 0.87309645 0.86082474 0.86734694 0.86699507 0.87027027 0.86699507] mean value: 0.8705406681510427 key: test_recall value: [0.81818182 0.86363636 0.81818182 0.81818182 0.95454545 0.81818182 0.81818182 0.81818182 0.68181818 0.90909091] mean value: 0.8318181818181818 key: train_recall value: [0.85353535 0.85353535 0.88888889 0.86868687 0.86868687 0.84343434 0.85858586 0.88888889 0.81313131 0.88888889] mean value: 0.8626262626262626 key: test_roc_auc value: [0.84387352 0.88833992 0.82213439 0.80039526 0.84683794 0.81818182 0.86363636 0.81818182 0.77272727 0.88636364] mean value: 0.8360671936758893 key: train_roc_auc value: [0.86488649 0.86736174 0.8850385 0.87246225 0.87246225 0.85521471 0.86525352 0.87794198 0.84745236 0.87794198] mean value: 0.8686015769064591 key: test_jcc value: [0.72 0.79166667 0.69230769 0.66666667 0.75 0.69230769 0.75 0.69230769 0.6 0.8 ] mean value: 0.715525641025641 key: train_jcc value: [0.75784753 0.76126126 0.79279279 0.77130045 0.77130045 0.74222222 0.75892857 0.78222222 0.72522523 0.78222222] mean value: 0.7645322947867791 MCC on Blind test: 0.75 Accuracy on Blind test: 0.88 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.6125834 0.7919445 0.91690707 0.64126277 0.27586269 3.05022645 1.53905249 2.51379037 1.85096502 1.52625299] mean value: 1.3718847751617431 key: score_time value: [0.01470351 0.01476789 0.01252818 0.01451087 0.01576161 0.01224637 0.01332831 0.0270524 0.01303458 0.01383972] mean value: 0.015177345275878907 key: test_mcc value: [0.95652174 0.91106719 0.91106719 0.91485328 1. 0.90909091 0.86452993 0.90909091 0.95553309 0.81818182] mean value: 0.9149936062423393 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.97777778 0.95555556 0.95555556 0.95555556 1. 0.95454545 0.93181818 0.95454545 0.97727273 0.90909091] mean value: 0.9571717171717172 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.97777778 0.95454545 0.95454545 0.95652174 1. 0.95454545 0.93023256 0.95454545 0.97674419 0.90909091] mean value: 0.9568548988366986 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95652174 0.95454545 0.95454545 0.91666667 1. 0.95454545 0.95238095 0.95454545 1. 0.90909091] mean value: 0.9552842085450781 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.95454545 0.95454545 1. 1. 0.95454545 0.90909091 0.95454545 0.95454545 0.90909091] mean value: 0.9590909090909091 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97826087 0.9555336 0.9555336 0.95652174 1. 0.95454545 0.93181818 0.95454545 0.97727273 0.90909091] mean value: 0.9573122529644269 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.95652174 0.91304348 0.91304348 0.91666667 1. 0.91304348 0.86956522 0.91304348 0.95454545 0.83333333] mean value: 0.9182806324110672 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.95 Accuracy on Blind test: 0.97 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.05299616 0.07507491 0.09356999 0.1008358 0.09290576 0.09139204 0.07153034 0.07866263 0.10971498 0.08641815] mean value: 0.08531007766723633 key: score_time value: [0.02284002 0.02450013 0.01256418 0.02179718 0.02442145 0.02206469 0.02130103 0.02369428 0.0252614 0.02076888] mean value: 0.021921324729919433 key: test_mcc value: [0.68911026 0.73559956 0.73320158 0.82574419 0.78530224 0.81818182 0.63636364 0.63636364 0.68252363 0.6882472 ] mean value: 0.7230637763883909 key: train_mcc value: [0.90500656 0.93043262 0.91500719 0.91500719 0.90500656 0.91026694 0.92519156 0.90522754 0.93021868 0.94034232] mean value: 0.9181707163710942 key: test_accuracy value: [0.84444444 0.86666667 0.86666667 0.91111111 0.88888889 0.90909091 0.81818182 0.81818182 0.84090909 0.84090909] mean value: 0.8605050505050506 key: train_accuracy value: [0.9525 0.965 0.9575 0.9575 0.9525 0.95511222 0.96259352 0.95261845 0.96508728 0.97007481] mean value: 0.9590486284289277 key: test_fscore value: [0.8372093 0.85714286 0.86363636 0.91304348 0.89361702 0.90909091 0.81818182 0.81818182 0.84444444 0.85106383] mean value: 0.8605611842328492 key: train_fscore value: [0.95214106 0.96517413 0.95717884 0.95717884 0.95214106 0.95477387 0.96221662 0.95189873 0.96482412 0.97 ] mean value: 0.9587527276654001 key: test_precision value: [0.85714286 0.9 0.86363636 0.875 0.84 0.90909091 0.81818182 0.81818182 0.82608696 0.8 ] mean value: 0.8507320722755506 key: train_precision value: [0.94974874 0.95098039 0.95477387 0.95477387 0.94974874 0.95 0.95979899 0.95431472 0.96 0.96039604] mean value: 0.9544535373678533 key: test_recall value: [0.81818182 0.81818182 0.86363636 0.95454545 0.95454545 0.90909091 0.81818182 0.81818182 0.86363636 0.90909091] mean value: 0.8727272727272728 key: train_recall value: [0.95454545 0.97979798 0.95959596 0.95959596 0.95454545 0.95959596 0.96464646 0.94949495 0.96969697 0.97979798] mean value: 0.9631313131313132 key: test_roc_auc value: [0.84387352 0.86561265 0.86660079 0.91205534 0.89031621 0.90909091 0.81818182 0.81818182 0.84090909 0.84090909] mean value: 0.8605731225296442 key: train_roc_auc value: [0.95252025 0.96514651 0.95752075 0.95752075 0.95252025 0.95516744 0.9626188 0.95257999 0.96514405 0.97019456] mean value: 0.9590933354419185 key: test_jcc value: [0.72 0.75 0.76 0.84 0.80769231 0.83333333 0.69230769 0.69230769 0.73076923 0.74074074] mean value: 0.7567150997150998 key: train_jcc value: [0.90865385 0.93269231 0.9178744 0.9178744 0.90865385 0.91346154 0.92718447 0.90821256 0.93203883 0.94174757] mean value: 0.9208393764904951 MCC on Blind test: 0.7 Accuracy on Blind test: 0.85 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01488352 0.01467943 0.01454282 0.01407647 0.01490998 0.01458836 0.02097845 0.01497102 0.01447248 0.01468301] mean value: 0.01527855396270752 key: score_time value: [0.01296782 0.01293969 0.0132544 0.0121994 0.01727891 0.01300836 0.01371241 0.01255512 0.01260257 0.01297855] mean value: 0.01334972381591797 key: test_mcc value: [0.70501339 0.73559956 0.60079051 0.64613475 0.78530224 0.77352678 0.77352678 0.7800135 0.50471461 0.77352678] mean value: 0.7078148917725633 key: train_mcc value: [0.73501647 0.71071591 0.78497756 0.75023791 0.72513051 0.72622252 0.69655581 0.75082817 0.66140847 0.76056935] mean value: 0.7301662688450195 key: test_accuracy value: [0.84444444 0.86666667 0.8 0.82222222 0.88888889 0.88636364 0.88636364 0.88636364 0.75 0.88636364] mean value: 0.8517676767676767 key: train_accuracy value: [0.8675 0.855 0.8925 0.875 0.8625 0.86284289 0.8478803 0.87531172 0.83042394 0.88029925] mean value: 0.8649258104738154 key: test_fscore value: [0.82051282 0.85714286 0.8 0.80952381 0.89361702 0.88372093 0.88372093 0.89361702 0.73170732 0.88372093] mean value: 0.8457283637503524 key: train_fscore value: [0.86513995 0.84974093 0.89113924 0.87179487 0.85933504 0.85788114 0.84155844 0.87179487 0.8238342 0.87817259] mean value: 0.8610391268444171 key: test_precision value: [0.94117647 0.9 0.7826087 0.85 0.84 0.9047619 0.9047619 0.84 0.78947368 0.9047619 ] mean value: 0.865754456473665 key: train_precision value: [0.87179487 0.87234043 0.89340102 0.88541667 0.87046632 0.87830688 0.86631016 0.88541667 0.84574468 0.88265306] mean value: 0.8751850747942309 key: test_recall value: [0.72727273 0.81818182 0.81818182 0.77272727 0.95454545 0.86363636 0.86363636 0.95454545 0.68181818 0.86363636] mean value: 0.8318181818181818 key: train_recall value: [0.85858586 0.82828283 0.88888889 0.85858586 0.84848485 0.83838384 0.81818182 0.85858586 0.8030303 0.87373737] mean value: 0.8474747474747475 key: test_roc_auc value: [0.84189723 0.86561265 0.80039526 0.82114625 0.89031621 0.88636364 0.88636364 0.88636364 0.75 0.88636364] mean value: 0.8514822134387352 key: train_roc_auc value: [0.86741174 0.85473547 0.89246425 0.87483748 0.86236124 0.86254167 0.84751455 0.87510574 0.83008658 0.88021844] mean value: 0.8647277166140259 key: test_jcc value: [0.69565217 0.75 0.66666667 0.68 0.80769231 0.79166667 0.79166667 0.80769231 0.57692308 0.79166667] mean value: 0.7359626532887402 key: train_jcc value: [0.76233184 0.73873874 0.80365297 0.77272727 0.75336323 0.75113122 0.7264574 0.77272727 0.70044053 0.78280543] mean value: 0.7564375898815598 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02090478 0.01843095 0.04428411 0.05754352 0.04021621 0.04565191 0.04121709 0.05066919 0.06244564 0.05169058] mean value: 0.043305397033691406 key: score_time value: [0.01230383 0.02153182 0.01792741 0.03710651 0.03666353 0.01897454 0.02727604 0.02798796 0.02072001 0.02464175] mean value: 0.02451333999633789 key: test_mcc value: [0.82506438 0.72299881 0.43884363 0.77865613 0.87476705 0.73960026 0.68313005 0.73029674 0.75592895 0.73029674] mean value: 0.7279582734082828 key: train_mcc value: [0.85528899 0.79408263 0.42430608 0.86966298 0.82348041 0.80626333 0.81054468 0.86083265 0.76826689 0.87858211] mean value: 0.7891310741692911 key: test_accuracy value: [0.91111111 0.84444444 0.66666667 0.88888889 0.93333333 0.86363636 0.81818182 0.86363636 0.86363636 0.86363636] mean value: 0.8517171717171718 key: train_accuracy value: [0.9275 0.89 0.655 0.9325 0.91 0.89526185 0.89775561 0.9276808 0.87281796 0.93765586] mean value: 0.8846172069825436 key: test_fscore value: [0.9047619 0.81081081 0.48275862 0.88888889 0.93617021 0.85 0.77777778 0.85714286 0.84210526 0.86956522] mean value: 0.8219981553387051 key: train_fscore value: [0.9276808 0.87709497 0.46511628 0.928 0.91304348 0.88202247 0.88515406 0.92225201 0.85302594 0.93946731] mean value: 0.8592857320609378 key: test_precision value: [0.95 1. 1. 0.86956522 0.88 0.94444444 1. 0.9 1. 0.83333333] mean value: 0.9377342995169082 key: train_precision value: [0.91625616 0.98125 1. 0.98305085 0.875 0.99367089 0.99371069 0.98285714 0.99328859 0.90232558] mean value: 0.9621409897849462 key: test_recall value: [0.86363636 0.68181818 0.31818182 0.90909091 1. 0.77272727 0.63636364 0.81818182 0.72727273 0.90909091] mean value: 0.7636363636363637 key: train_recall value: [0.93939394 0.79292929 0.3030303 0.87878788 0.95454545 0.79292929 0.7979798 0.86868687 0.74747475 0.97979798] mean value: 0.8055555555555556 key: test_roc_auc value: [0.91007905 0.84090909 0.65909091 0.88932806 0.93478261 0.86363636 0.81818182 0.86363636 0.86363636 0.86363636] mean value: 0.8506916996047431 key: train_roc_auc value: [0.92761776 0.8890389 0.65151515 0.9319682 0.91044104 0.89400159 0.89652684 0.92695427 0.87127432 0.93817485] mean value: 0.8837512938485966 key: test_jcc value: [0.82608696 0.68181818 0.31818182 0.8 0.88 0.73913043 0.63636364 0.75 0.72727273 0.76923077] mean value: 0.7128084524171481 key: train_jcc value: [0.86511628 0.78109453 0.3030303 0.86567164 0.84 0.78894472 0.79396985 0.85572139 0.74371859 0.88584475] mean value: 0.7723112058976718 MCC on Blind test: 0.58 Accuracy on Blind test: 0.76 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.06124377 0.04860258 0.05644703 0.0544219 0.04448938 0.02667141 0.06391263 0.04354548 0.06314015 0.04449439] mean value: 0.05069687366485596 key: score_time value: [0.04430127 0.02314234 0.02817941 0.02064776 0.02313566 0.01204276 0.0358007 0.02944756 0.0219655 0.02373385] mean value: 0.026239681243896484 key: test_mcc value: [0.78405645 0.73663511 0.73320158 0.73663511 0.8360602 0.7800135 0.68313005 0.56694671 0.82158384 0.21483446] mean value: 0.6893097003051051 key: train_mcc value: [0.85260278 0.85532995 0.920046 0.90106836 0.86219639 0.8725435 0.82264299 0.80465299 0.83772405 0.44233239] mean value: 0.8171139401076141 key: test_accuracy value: [0.88888889 0.86666667 0.86666667 0.86666667 0.91111111 0.88636364 0.81818182 0.77272727 0.90909091 0.56818182] mean value: 0.8354545454545454 key: train_accuracy value: [0.9225 0.925 0.96 0.95 0.93 0.93516209 0.90773067 0.89526185 0.91521197 0.66084788] mean value: 0.9001714463840399 key: test_fscore value: [0.87804878 0.86956522 0.86363636 0.86956522 0.91666667 0.89361702 0.77777778 0.73684211 0.91304348 0.68852459] mean value: 0.8407287218315779 key: train_fscore value: [0.91598916 0.92822967 0.95979899 0.94818653 0.93170732 0.93658537 0.899729 0.88268156 0.91943128 0.7443609 ] mean value: 0.9066699774774758 key: test_precision value: [0.94736842 0.83333333 0.86363636 0.83333333 0.84615385 0.84 1. 0.875 0.875 0.53846154] mean value: 0.8452286835971047 key: train_precision value: [0.98830409 0.88181818 0.955 0.97340426 0.9009434 0.90566038 0.97076023 0.9875 0.86607143 0.59281437] mean value: 0.902227633803653 key: test_recall value: [0.81818182 0.90909091 0.86363636 0.90909091 1. 0.95454545 0.63636364 0.63636364 0.95454545 0.95454545] mean value: 0.8636363636363636 key: train_recall value: [0.85353535 0.97979798 0.96464646 0.92424242 0.96464646 0.96969697 0.83838384 0.7979798 0.97979798 1. ] mean value: 0.9272727272727272 key: test_roc_auc value: [0.88735178 0.86758893 0.86660079 0.86758893 0.91304348 0.88636364 0.81818182 0.77272727 0.90909091 0.56818182] mean value: 0.8356719367588933 key: train_roc_auc value: [0.92181718 0.92554255 0.960046 0.94974497 0.93034303 0.9355874 0.90687665 0.89406379 0.91600736 0.66502463] mean value: 0.9005053584176151 key: test_jcc value: [0.7826087 0.76923077 0.76 0.76923077 0.84615385 0.80769231 0.63636364 0.58333333 0.84 0.525 ] mean value: 0.7319613357656836 key: train_jcc value: [0.845 0.86607143 0.92270531 0.90147783 0.87214612 0.88073394 0.81773399 0.79 0.85087719 0.59281437] mean value: 0.833956019315672 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.29672027 0.4601624 0.43517661 0.41409087 0.33378577 0.4149332 0.38982415 0.22151065 0.21765399 0.2491889 ] mean value: 0.3433046817779541 key: score_time value: [0.0209012 0.02083015 0.0207603 0.0409019 0.04082274 0.0407865 0.02054024 0.0209372 0.02071619 0.04051232] mean value: 0.028770875930786134 key: test_mcc value: [1. 0.91452919 0.91106719 0.91485328 1. 0.91287093 0.86452993 0.95553309 0.95553309 0.86452993] mean value: 0.9293446631562545 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.95555556 0.95555556 0.95555556 1. 0.95454545 0.93181818 0.97727273 0.97727273 0.93181818] mean value: 0.963939393939394 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.95238095 0.95454545 0.95652174 1. 0.95652174 0.93023256 0.97777778 0.97674419 0.93023256] mean value: 0.9634956965290635 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.95454545 0.91666667 1. 0.91666667 0.95238095 0.95652174 1. 0.95238095] mean value: 0.9649162431771128 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.90909091 0.95454545 1. 1. 1. 0.90909091 1. 0.95454545 0.90909091] mean value: 0.9636363636363636 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.95454545 0.9555336 0.95652174 1. 0.95454545 0.93181818 0.97727273 0.97727273 0.93181818] mean value: 0.9639328063241107 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.90909091 0.91304348 0.91666667 1. 0.91666667 0.86956522 0.95652174 0.95454545 0.86956522] mean value: 0.930566534914361 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.95 Accuracy on Blind test: 0.97 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.11559272 0.07994819 0.09201694 0.09574533 0.08644199 0.09145522 0.11973166 0.12139821 0.11484957 0.12906265] mean value: 0.10462424755096436 key: score_time value: [0.0227809 0.02421069 0.02284312 0.02478456 0.02595878 0.02617145 0.0282135 0.02829266 0.02692151 0.02787161] mean value: 0.025804877281188965 key: test_mcc value: [0.95652174 0.91106719 0.91106719 0.91485328 1. 0.90909091 0.86452993 0.90909091 0.86452993 0.86452993] mean value: 0.9105281028070168 key: train_mcc value: [0.98004502 0.989999 0.9900495 0.99501169 0.99501169 0.97506905 0.99502376 0.98009308 0.98514815 0.98009308] mean value: 0.9865544028626636 key: test_accuracy value: [0.97777778 0.95555556 0.95555556 0.95555556 1. 0.95454545 0.93181818 0.95454545 0.93181818 0.93181818] mean value: 0.9548989898989899 key: train_accuracy value: [0.99 0.995 0.995 0.9975 0.9975 0.98753117 0.99750623 0.99002494 0.9925187 0.99002494] mean value: 0.9932605985037406 key: test_fscore value: [0.97777778 0.95454545 0.95454545 0.95652174 1. 0.95454545 0.93023256 0.95454545 0.93023256 0.93333333] mean value: 0.9546279784702434 key: train_fscore value: [0.98984772 0.99494949 0.99497487 0.99746835 0.99746835 0.98734177 0.99746835 0.98984772 0.9924812 0.98984772] mean value: 0.9931695554980033 key: test_precision value: [0.95652174 0.95454545 0.95454545 0.91666667 1. 0.95454545 0.95238095 0.95454545 0.95238095 0.91304348] mean value: 0.9509175607001694 key: train_precision value: [0.99489796 0.99494949 0.99 1. 1. 0.98984772 1. 0.99489796 0.98507463 0.99489796] mean value: 0.9944565715102227 key: test_recall value: [1. 0.95454545 0.95454545 1. 1. 0.95454545 0.90909091 0.95454545 0.90909091 0.95454545] mean value: 0.9590909090909091 key: train_recall value: [0.98484848 0.99494949 1. 0.99494949 0.99494949 0.98484848 0.99494949 0.98484848 1. 0.98484848] mean value: 0.9919191919191919 key: test_roc_auc value: [0.97826087 0.9555336 0.9555336 0.95652174 1. 0.95454545 0.93181818 0.95454545 0.93181818 0.93181818] mean value: 0.9550395256916997 key: train_roc_auc value: [0.98994899 0.9949995 0.9950495 0.99747475 0.99747475 0.98749813 0.99747475 0.98996119 0.99261084 0.98996119] mean value: 0.9932453590186605 key: test_jcc value: [0.95652174 0.91304348 0.91304348 0.91666667 1. 0.91304348 0.86956522 0.91304348 0.86956522 0.875 ] mean value: 0.9139492753623188 key: train_jcc value: [0.9798995 0.98994975 0.99 0.99494949 0.99494949 0.975 0.99494949 0.9798995 0.98507463 0.9798995 ] mean value: 0.9864571352920186 MCC on Blind test: 0.96 Accuracy on Blind test: 0.98 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.16882658 0.22207904 0.2207644 0.23970628 0.2196691 0.18807268 0.20044613 0.23684382 0.19207048 0.23526812] mean value: 0.21237466335296631 key: score_time value: [0.02288747 0.02833319 0.03328061 0.03298473 0.03413868 0.04017615 0.03194618 0.03593063 0.03210187 0.03190231] mean value: 0.03236818313598633 key: test_mcc value: [0.70501339 0.64613475 0.55666994 0.51185771 0.51089209 0.68252363 0.77352678 0.60678804 0.59152048 0.73029674] mean value: 0.6315223557482503 key: train_mcc value: [0.98510714 1. 0.99004752 0.99004752 0.99004752 0.99007143 0.99007143 0.99502376 0.99007143 0.99007143] mean value: 0.9910559193261419 key: test_accuracy value: [0.84444444 0.82222222 0.77777778 0.75555556 0.75555556 0.84090909 0.88636364 0.79545455 0.79545455 0.86363636] mean value: 0.8137373737373738 key: train_accuracy value: [0.9925 1. 0.995 0.995 0.995 0.99501247 0.99501247 0.99750623 0.99501247 0.99501247] mean value: 0.9955056109725686 key: test_fscore value: [0.82051282 0.80952381 0.76190476 0.75555556 0.74418605 0.84444444 0.88372093 0.76923077 0.79069767 0.86956522] mean value: 0.8049342029726256 key: train_fscore value: [0.99236641 1. 0.99492386 0.99492386 0.99492386 0.99492386 0.99492386 0.99746835 0.99492386 0.99492386] mean value: 0.9954301771720262 key: test_precision value: [0.94117647 0.85 0.8 0.73913043 0.76190476 0.82608696 0.9047619 0.88235294 0.80952381 0.83333333] mean value: 0.8348270612592863 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 0.77272727 0.72727273 0.77272727 0.72727273 0.86363636 0.86363636 0.68181818 0.77272727 0.90909091] mean value: 0.7818181818181819 key: train_recall value: [0.98484848 1. 0.98989899 0.98989899 0.98989899 0.98989899 0.98989899 0.99494949 0.98989899 0.98989899] mean value: 0.990909090909091 key: test_roc_auc value: [0.84189723 0.82114625 0.77667984 0.75592885 0.75494071 0.84090909 0.88636364 0.79545455 0.79545455 0.86363636] mean value: 0.8132411067193676 key: train_roc_auc value: [0.99242424 1. 0.99494949 0.99494949 0.99494949 0.99494949 0.99494949 0.99747475 0.99494949 0.99494949] mean value: 0.9954545454545455 key: test_jcc value: [0.69565217 0.68 0.61538462 0.60714286 0.59259259 0.73076923 0.79166667 0.625 0.65384615 0.76923077] mean value: 0.676128505954593 key: train_jcc value: [0.98484848 1. 0.98989899 0.98989899 0.98989899 0.98989899 0.98989899 0.99494949 0.98989899 0.98989899] mean value: 0.990909090909091 MCC on Blind test: 0.61 Accuracy on Blind test: 0.8 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.919806 1.02738333 0.90889049 0.87410593 1.04062057 1.07893395 1.05196571 0.85241294 0.64149427 0.65414619] mean value: 0.9049759387969971 key: score_time value: [0.01315594 0.01307869 0.01312375 0.02761984 0.0129056 0.01343155 0.01265907 0.01010847 0.01001835 0.00931072] mean value: 0.013541197776794434 key: test_mcc value: [0.95652174 0.91106719 0.91106719 0.91485328 1. 0.91287093 0.90909091 0.95553309 0.90909091 0.81818182] mean value: 0.9198277056731419 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.97777778 0.95555556 0.95555556 0.95555556 1. 0.95454545 0.95454545 0.97727273 0.95454545 0.90909091] mean value: 0.9594444444444444 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.97777778 0.95454545 0.95454545 0.95652174 1. 0.95652174 0.95454545 0.97777778 0.95454545 0.90909091] mean value: 0.9595871761089152 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95652174 0.95454545 0.95454545 0.91666667 1. 0.91666667 0.95454545 0.95652174 0.95454545 0.90909091] mean value: 0.9473649538866931 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.95454545 0.95454545 1. 1. 1. 0.95454545 1. 0.95454545 0.90909091] mean value: 0.9727272727272728 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.97826087 0.9555336 0.9555336 0.95652174 1. 0.95454545 0.95454545 0.97727273 0.95454545 0.90909091] mean value: 0.9595849802371541 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.95652174 0.91304348 0.91304348 0.91666667 1. 0.91666667 0.91304348 0.95652174 0.91304348 0.83333333] mean value: 0.9231884057971014 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.95 Accuracy on Blind test: 0.97 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.1153357 0.06680679 0.06007385 0.09345102 0.07372856 0.1125803 0.13374567 0.08797312 0.09443545 0.10179043] mean value: 0.09399209022521973 key: score_time value: [0.0196929 0.02079773 0.020576 0.02017331 0.03191543 0.02789617 0.02618909 0.0400703 0.02141643 0.02100539] mean value: 0.024973273277282715 key: test_mcc value: [0.28827551 0.59109821 0.44008623 0.20198059 0.28827551 0.50471461 0.48795004 0.50051733 0.56694671 0.28347335] mean value: 0.4153318091778042 key: train_mcc value: [0.84199403 0.87973027 0.96074967 0.91500719 0.88626479 0.96539284 0.93515962 0.98514265 0.92771103 0.75793176] mean value: 0.9055083848914132 key: test_accuracy value: [0.64444444 0.77777778 0.71111111 0.6 0.64444444 0.75 0.72727273 0.75 0.77272727 0.63636364] mean value: 0.7014141414141414 key: train_accuracy value: [0.915 0.9375 0.98 0.9575 0.94 0.98254364 0.96758105 0.9925187 0.96259352 0.86533666] mean value: 0.9500573566084788 key: test_fscore value: [0.61904762 0.72222222 0.64864865 0.60869565 0.61904762 0.76595745 0.66666667 0.74418605 0.73684211 0.57894737] mean value: 0.6710261394811038 key: train_fscore value: [0.90607735 0.93333333 0.97938144 0.95717884 0.93548387 0.98254364 0.96708861 0.99236641 0.96062992 0.84210526] mean value: 0.9456188682100336 key: test_precision value: [0.65 0.92857143 0.8 0.58333333 0.65 0.72 0.85714286 0.76190476 0.875 0.6875 ] mean value: 0.7513452380952381 key: train_precision value: [1. 0.98870056 1. 0.95477387 1. 0.97044335 0.96954315 1. 1. 1. ] mean value: 0.9883460931280301 key: test_recall value: [0.59090909 0.59090909 0.54545455 0.63636364 0.59090909 0.81818182 0.54545455 0.72727273 0.63636364 0.5 ] mean value: 0.6181818181818182 key: train_recall value: [0.82828283 0.88383838 0.95959596 0.95959596 0.87878788 0.99494949 0.96464646 0.98484848 0.92424242 0.72727273] mean value: 0.9106060606060606 key: test_roc_auc value: [0.64328063 0.77371542 0.70750988 0.60079051 0.64328063 0.75 0.72727273 0.75 0.77272727 0.63636364] mean value: 0.7004940711462451 key: train_roc_auc value: [0.91414141 0.9369687 0.97979798 0.95752075 0.93939394 0.98269642 0.96754491 0.99242424 0.96212121 0.86363636] mean value: 0.9496245930011721 key: test_jcc value: [0.44827586 0.56521739 0.48 0.4375 0.44827586 0.62068966 0.5 0.59259259 0.58333333 0.40740741] mean value: 0.5083292103948026 key: train_jcc value: [0.82828283 0.875 0.95959596 0.9178744 0.87878788 0.96568627 0.93627451 0.98484848 0.92424242 0.72727273] mean value: 0.8997865483479294 MCC on Blind test: 0.54 Accuracy on Blind test: 0.77 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.04879665 0.06443977 0.06408358 0.06347299 0.0626421 0.05391741 0.04901862 0.06241035 0.06304216 0.06274414] mean value: 0.05945677757263183 key: score_time value: [0.03340101 0.03197694 0.03278589 0.03180408 0.03367066 0.03720546 0.0333674 0.02973032 0.03171229 0.03540778] mean value: 0.0331061840057373 key: test_mcc value: [0.78405645 0.78405645 0.73320158 0.8360602 0.82574419 0.81818182 0.86452993 0.72727273 0.77352678 0.77352678] mean value: 0.7920156928908952 key: train_mcc value: [0.86529061 0.87073544 0.85500344 0.85510344 0.84528736 0.86562671 0.87541359 0.86032741 0.8705095 0.8903152 ] mean value: 0.8653612710221481 key: test_accuracy value: [0.88888889 0.88888889 0.86666667 0.91111111 0.91111111 0.90909091 0.93181818 0.86363636 0.88636364 0.88636364] mean value: 0.8943939393939394 key: train_accuracy value: [0.9325 0.935 0.9275 0.9275 0.9225 0.93266833 0.93765586 0.93017456 0.93516209 0.94513716] mean value: 0.9325798004987531 key: test_fscore value: [0.87804878 0.87804878 0.86363636 0.91666667 0.91304348 0.90909091 0.93023256 0.86363636 0.88888889 0.88888889] mean value: 0.8930181678184095 key: train_fscore value: [0.93266833 0.93564356 0.92695214 0.9273183 0.92269327 0.93266833 0.93734336 0.92929293 0.935 0.94472362] mean value: 0.9324303832120122 key: test_precision value: [0.94736842 0.94736842 0.86363636 0.84615385 0.875 0.90909091 0.95238095 0.86363636 0.86956522 0.86956522] mean value: 0.8943765711786307 key: train_precision value: [0.92118227 0.91747573 0.92462312 0.92039801 0.91133005 0.92118227 0.93034826 0.92929293 0.92574257 0.94 ] mean value: 0.9241575197221089 key: test_recall value: [0.81818182 0.81818182 0.86363636 1. 0.95454545 0.90909091 0.90909091 0.86363636 0.90909091 0.90909091] mean value: 0.8954545454545455 key: train_recall value: [0.94444444 0.95454545 0.92929293 0.93434343 0.93434343 0.94444444 0.94444444 0.92929293 0.94444444 0.94949495] mean value: 0.9409090909090909 key: test_roc_auc value: [0.88735178 0.88735178 0.86660079 0.91304348 0.91205534 0.90909091 0.93181818 0.86363636 0.88636364 0.88636364] mean value: 0.8943675889328064 key: train_roc_auc value: [0.93261826 0.93519352 0.92751775 0.92756776 0.92261726 0.93281336 0.93773946 0.93016371 0.93527641 0.94519082] mean value: 0.9326698310225111 key: test_jcc value: [0.7826087 0.7826087 0.76 0.84615385 0.84 0.83333333 0.86956522 0.76 0.8 0.8 ] mean value: 0.8074269788182832 key: train_jcc value: [0.87383178 0.87906977 0.86384977 0.86448598 0.85648148 0.87383178 0.88207547 0.86792453 0.87793427 0.8952381 ] mean value: 0.8734722914430403 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.55836701 0.52241182 0.42924356 0.42723894 0.42407441 0.45521879 0.46713424 0.43198848 0.44248724 0.43643761] mean value: 0.4594602108001709 key: score_time value: [0.03078747 0.03211427 0.03577662 0.03534436 0.03439927 0.0340023 0.0342772 0.03501654 0.03661156 0.03467274] mean value: 0.03430023193359375 key: test_mcc value: [0.78405645 0.78405645 0.60000118 0.8360602 0.82574419 0.81818182 0.81818182 0.72727273 0.77352678 0.77352678] mean value: 0.774060840755457 key: train_mcc value: [0.86529061 0.87073544 0.81510094 0.809981 0.84528736 0.86562671 0.90023387 0.86032741 0.8705095 0.8903152 ] mean value: 0.8593408045055162 key: test_accuracy value: [0.88888889 0.88888889 0.8 0.91111111 0.91111111 0.90909091 0.90909091 0.86363636 0.88636364 0.88636364] mean value: 0.8854545454545455 key: train_accuracy value: [0.9325 0.935 0.9075 0.905 0.9225 0.93266833 0.95012469 0.93017456 0.93516209 0.94513716] mean value: 0.9295766832917706 key: test_fscore value: [0.87804878 0.87804878 0.79069767 0.91666667 0.91304348 0.90909091 0.90909091 0.86363636 0.88888889 0.88888889] mean value: 0.883610133991771 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:107: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:110: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.93266833 0.93564356 0.90726817 0.9040404 0.92269327 0.93266833 0.94949495 0.92929293 0.935 0.94472362] mean value: 0.9293493560888269 key: test_precision value: [0.94736842 0.94736842 0.80952381 0.84615385 0.875 0.90909091 0.90909091 0.86363636 0.86956522 0.86956522] mean value: 0.884636311438371 key: train_precision value: [0.92118227 0.91747573 0.90049751 0.9040404 0.91133005 0.92118227 0.94949495 0.92929293 0.92574257 0.94 ] mean value: 0.9220238678959648 key: test_recall value: [0.81818182 0.81818182 0.77272727 1. 0.95454545 0.90909091 0.90909091 0.86363636 0.90909091 0.90909091] mean value: 0.8863636363636364 key: train_recall value: [0.94444444 0.95454545 0.91414141 0.9040404 0.93434343 0.94444444 0.94949495 0.92929293 0.94444444 0.94949495] mean value: 0.9368686868686869 key: test_roc_auc value: [0.88735178 0.88735178 0.79940711 0.91304348 0.91205534 0.90909091 0.90909091 0.86363636 0.88636364 0.88636364] mean value: 0.8853754940711462 key: train_roc_auc value: [0.93261826 0.93519352 0.90756576 0.9049905 0.92261726 0.93281336 0.95011693 0.93016371 0.93527641 0.94519082] mean value: 0.9296546526573839 key: test_jcc value: [0.7826087 0.7826087 0.65384615 0.84615385 0.84 0.83333333 0.83333333 0.76 0.8 0.8 ] mean value: 0.7931884057971015 key: train_jcc value: [0.87383178 0.87906977 0.83027523 0.82488479 0.85648148 0.87383178 0.90384615 0.86792453 0.87793427 0.8952381 ] mean value: 0.8683317871996343 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.1082418 0.11077762 0.11822391 0.08075857 0.08925223 0.06861591 0.10095501 0.08981323 0.08645415 0.09447336] mean value: 0.09475657939910889 key: score_time value: [0.02976584 0.03569388 0.0412004 0.01653552 0.02434349 0.02014518 0.02598023 0.02477598 0.03175187 0.02436733] mean value: 0.027455973625183105 key: test_mcc value: [0.77865613 0.64426877 0.86732843 0.77821935 0.86758893 0.82574419 0.69583743 0.68911026 0.95652174 0.82213439] mean value: 0.7925409622270596 key: train_mcc value: [0.85688852 0.8716498 0.86172755 0.86177295 0.87664317 0.85687806 0.86188899 0.871768 0.85679795 0.86188899] mean value: 0.8637903997161932 key: test_accuracy value: [0.88888889 0.82222222 0.93333333 0.88888889 0.93333333 0.91111111 0.84444444 0.84444444 0.97777778 0.91111111] mean value: 0.8955555555555555 key: train_accuracy value: [0.92839506 0.93580247 0.9308642 0.9308642 0.9382716 0.92839506 0.9308642 0.93580247 0.92839506 0.9308642 ] mean value: 0.9318518518518518 key: test_fscore value: [0.88888889 0.82608696 0.93617021 0.89361702 0.93333333 0.91304348 0.85106383 0.8372093 0.97777778 0.90909091] mean value: 0.8966281710028886 key: train_fscore value: [0.92874693 0.93596059 0.93069307 0.93103448 0.93857494 0.92909535 0.93170732 0.93658537 0.92874693 0.93170732] mean value: 0.932285229379058 key: test_precision value: [0.90909091 0.82608696 0.91666667 0.875 0.95454545 0.875 0.8 0.85714286 0.95652174 0.90909091] mean value: 0.887914549218897 key: train_precision value: [0.92195122 0.93137255 0.93069307 0.92647059 0.93170732 0.9223301 0.92270531 0.92753623 0.92647059 0.92270531] mean value: 0.9263942288373254 key: test_recall value: [0.86956522 0.82608696 0.95652174 0.91304348 0.91304348 0.95454545 0.90909091 0.81818182 1. 0.90909091] mean value: 0.9069169960474308 key: train_recall value: [0.93564356 0.94059406 0.93069307 0.93564356 0.94554455 0.93596059 0.9408867 0.94581281 0.93103448 0.9408867 ] mean value: 0.9382700092669365 key: test_roc_auc value: [0.88932806 0.82213439 0.93280632 0.88833992 0.93379447 0.91205534 0.8458498 0.84387352 0.97826087 0.91106719] mean value: 0.8957509881422925 key: train_roc_auc value: [0.92841292 0.93581427 0.93086378 0.93087597 0.93828952 0.92837634 0.93083939 0.93577769 0.92838853 0.93083939] mean value: 0.9318477783738965 key: test_jcc value: [0.8 0.7037037 0.88 0.80769231 0.875 0.84 0.74074074 0.72 0.95652174 0.83333333] mean value: 0.815699182460052 key: train_jcc value: [0.86697248 0.87962963 0.87037037 0.87096774 0.88425926 0.86757991 0.87214612 0.88073394 0.86697248 0.87214612] mean value: 0.8731778046396034 MCC on Blind test: 0.8 Accuracy on Blind test: 0.9 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [1.91950297 2.00819826 2.98527718 1.89961886 0.94165349 1.18352866 0.94931173 1.02501869 0.95997047 0.98270178] mean value: 1.4854782104492188 key: score_time value: [0.01864243 0.04957104 0.03117299 0.01511431 0.01509213 0.0130074 0.01559639 0.01299739 0.01515102 0.01567769] mean value: 0.020202279090881348 key: test_mcc value: [0.82574419 0.68911026 0.86732843 0.77821935 0.82574419 0.82574419 0.73663511 0.68911026 0.95652174 0.77821935] mean value: 0.797237707853288 key: train_mcc value: [0.8965753 0.90127552 0.89135736 0.89139819 0.90627515 0.82225691 0.89630533 0.84716163 0.88164702 0.89152603] mean value: 0.8825778442400698 key: test_accuracy value: [0.91111111 0.84444444 0.93333333 0.88888889 0.91111111 0.91111111 0.86666667 0.84444444 0.97777778 0.88888889] mean value: 0.8977777777777778 key: train_accuracy value: [0.94814815 0.95061728 0.94567901 0.94567901 0.95308642 0.91111111 0.94814815 0.92345679 0.94074074 0.94567901] mean value: 0.9412345679012346 key: test_fscore value: [0.90909091 0.85106383 0.93617021 0.89361702 0.90909091 0.91304348 0.86956522 0.8372093 0.97777778 0.88372093] mean value: 0.8980349587999696 key: train_fscore value: [0.94865526 0.95024876 0.94554455 0.94527363 0.95331695 0.91176471 0.94840295 0.92457421 0.94146341 0.94634146] mean value: 0.9415585894135641 key: test_precision value: [0.95238095 0.83333333 0.91666667 0.875 0.95238095 0.875 0.83333333 0.85714286 0.95652174 0.9047619 ] mean value: 0.8956521739130434 key: train_precision value: [0.93719807 0.955 0.94554455 0.95 0.94634146 0.90731707 0.94607843 0.91346154 0.93236715 0.93719807] mean value: 0.9370506345899053 key: test_recall value: [0.86956522 0.86956522 0.95652174 0.91304348 0.86956522 0.95454545 0.90909091 0.81818182 1. 0.86363636] mean value: 0.9023715415019763 key: train_recall value: [0.96039604 0.94554455 0.94554455 0.94059406 0.96039604 0.91625616 0.95073892 0.93596059 0.95073892 0.95566502] mean value: 0.9461834853436082 key: test_roc_auc value: [0.91205534 0.84387352 0.93280632 0.88833992 0.91205534 0.91205534 0.86758893 0.84387352 0.97826087 0.88833992] mean value: 0.8979249011857707 key: train_roc_auc value: [0.94817832 0.95060479 0.94567868 0.94566649 0.95310442 0.91109838 0.94814174 0.92342584 0.94071599 0.94565429] mean value: 0.9412268936253231 key: test_jcc value: [0.83333333 0.74074074 0.88 0.80769231 0.83333333 0.84 0.76923077 0.72 0.95652174 0.79166667] mean value: 0.8172518890127586 key: train_jcc value: [0.90232558 0.90521327 0.89671362 0.89622642 0.91079812 0.83783784 0.90186916 0.85972851 0.88940092 0.89814815] mean value: 0.8898261577031877 MCC on Blind test: 0.8 Accuracy on Blind test: 0.9 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.02403355 0.01210332 0.01167107 0.01175499 0.01173353 0.01152921 0.01138949 0.01154685 0.01153827 0.01136518] mean value: 0.012866544723510741 key: score_time value: [0.01191664 0.01043248 0.01031709 0.01031232 0.00995064 0.01007748 0.00997305 0.00997305 0.01015615 0.00996041] mean value: 0.010306930541992188 key: test_mcc value: [0.86758893 0.55841694 0.60079051 0.61706091 0.74605372 0.73320158 0.42993591 0.60000118 0.59109821 0.69404997] mean value: 0.6438197861131165 key: train_mcc value: [0.67340117 0.66345741 0.67340117 0.68334493 0.69787618 0.66984267 0.647501 0.71790239 0.68673529 0.6682388 ] mean value: 0.6781701020371597 key: test_accuracy value: [0.93333333 0.77777778 0.8 0.8 0.86666667 0.86666667 0.71111111 0.8 0.77777778 0.84444444] mean value: 0.8177777777777778 key: train_accuracy value: [0.8345679 0.82962963 0.8345679 0.83950617 0.84691358 0.83209877 0.81481481 0.85679012 0.84197531 0.83209877] mean value: 0.8362962962962963 key: test_fscore value: [0.93333333 0.77272727 0.8 0.7804878 0.85714286 0.86363636 0.66666667 0.79069767 0.72222222 0.82926829] mean value: 0.8016182487708297 key: train_fscore value: [0.82414698 0.81889764 0.82414698 0.82939633 0.83769634 0.82105263 0.79108635 0.84895833 0.83505155 0.82291667] mean value: 0.8253349790533351 key: test_precision value: [0.95454545 0.80952381 0.81818182 0.88888889 0.94736842 0.86363636 0.76470588 0.80952381 0.92857143 0.89473684] mean value: 0.8679682718382409 key: train_precision value: [0.87709497 0.87150838 0.87709497 0.88268156 0.88888889 0.88135593 0.91025641 0.90055249 0.87567568 0.87292818] mean value: 0.8838037458275947 key: test_recall value: [0.91304348 0.73913043 0.7826087 0.69565217 0.7826087 0.86363636 0.59090909 0.77272727 0.59090909 0.77272727] mean value: 0.750395256916996 key: train_recall value: [0.77722772 0.77227723 0.77722772 0.78217822 0.79207921 0.76847291 0.69950739 0.80295567 0.79802956 0.77832512] mean value: 0.774828073940399 key: test_roc_auc value: [0.93379447 0.77865613 0.80039526 0.80237154 0.86857708 0.86660079 0.70849802 0.79940711 0.77371542 0.84288538] mean value: 0.8174901185770751 key: train_roc_auc value: [0.83442667 0.82948837 0.83442667 0.83936497 0.84677852 0.83225626 0.81510023 0.85692338 0.84208409 0.83223187] mean value: 0.8363081012534751 key: test_jcc value: [0.875 0.62962963 0.66666667 0.64 0.75 0.76 0.5 0.65384615 0.56521739 0.70833333] mean value: 0.6748693174780132 key: train_jcc value: [0.70089286 0.69333333 0.70089286 0.70852018 0.72072072 0.69642857 0.65437788 0.73755656 0.71681416 0.69911504] mean value: 0.7028652163950665 MCC on Blind test: 0.68 Accuracy on Blind test: 0.84 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01156735 0.01157784 0.01160192 0.01164412 0.01163125 0.01170754 0.01092172 0.01057267 0.0117619 0.0108943 ] mean value: 0.011388063430786133 key: score_time value: [0.00963807 0.01005244 0.00998616 0.01013279 0.01015091 0.01013255 0.01019764 0.0097158 0.01023817 0.01015043] mean value: 0.010039496421813964 key: test_mcc value: [0.74605372 0.4229249 0.68972332 0.69404997 0.78530224 0.78530224 0.55841694 0.64426877 0.73559956 0.60637261] mean value: 0.666801427789491 key: train_mcc value: [0.7385111 0.69500224 0.73847923 0.75849711 0.72839898 0.76296152 0.77288136 0.7777832 0.72358281 0.73337398] mean value: 0.7429471510582903 key: test_accuracy value: [0.86666667 0.71111111 0.84444444 0.84444444 0.88888889 0.88888889 0.77777778 0.82222222 0.86666667 0.8 ] mean value: 0.8311111111111111 key: train_accuracy value: [0.8691358 0.84691358 0.8691358 0.87901235 0.86419753 0.88148148 0.88641975 0.88888889 0.8617284 0.86666667] mean value: 0.871358024691358 key: test_fscore value: [0.85714286 0.71111111 0.84444444 0.85714286 0.88372093 0.89361702 0.7826087 0.81818182 0.85714286 0.80851064] mean value: 0.8313623230625146 key: train_fscore value: [0.87041565 0.84183673 0.86716792 0.88077859 0.86352357 0.8817734 0.88613861 0.88943489 0.86341463 0.86633663] mean value: 0.8710820634544677 key: test_precision value: [0.94736842 0.72727273 0.86363636 0.80769231 0.95 0.84 0.75 0.81818182 0.9 0.76 ] mean value: 0.8364151637835848 key: train_precision value: [0.85990338 0.86842105 0.87817259 0.86602871 0.86567164 0.8817734 0.89054726 0.8872549 0.85507246 0.87064677] mean value: 0.8723492167626019 key: test_recall value: [0.7826087 0.69565217 0.82608696 0.91304348 0.82608696 0.95454545 0.81818182 0.81818182 0.81818182 0.86363636] mean value: 0.8316205533596838 key: train_recall value: [0.88118812 0.81683168 0.85643564 0.8960396 0.86138614 0.8817734 0.8817734 0.89162562 0.87192118 0.86206897] mean value: 0.8701043749695166 key: test_roc_auc value: [0.86857708 0.71146245 0.84486166 0.84288538 0.89031621 0.89031621 0.77865613 0.82213439 0.86561265 0.8013834 ] mean value: 0.8316205533596839 key: train_roc_auc value: [0.86916549 0.84683949 0.86910452 0.87905428 0.86419061 0.88148076 0.88643125 0.88888211 0.86170317 0.86667805] mean value: 0.8713529727356972 key: test_jcc value: [0.75 0.55172414 0.73076923 0.75 0.79166667 0.80769231 0.64285714 0.69230769 0.75 0.67857143] mean value: 0.7145588606795503 key: train_jcc value: [0.77056277 0.72687225 0.76548673 0.78695652 0.75982533 0.78854626 0.79555556 0.80088496 0.75965665 0.76419214] mean value: 0.7718539151085453 MCC on Blind test: 0.73 Accuracy on Blind test: 0.87 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01085877 0.0109148 0.0109849 0.01096702 0.01075697 0.01008892 0.01093984 0.01145196 0.01152682 0.01099157] mean value: 0.01094815731048584 key: score_time value: [0.01825953 0.01752472 0.01740408 0.01779437 0.01755857 0.01792526 0.01943898 0.01912856 0.01847744 0.01721025] mean value: 0.018072175979614257 key: test_mcc value: [0.5169078 0.48698902 0.51185771 0.44008623 0.64752602 0.54071329 0.37774032 0.37774032 0.58158 0.60000118] mean value: 0.5081141873870801 key: train_mcc value: [0.69876844 0.71387102 0.68398976 0.72358281 0.68953436 0.71448494 0.70937299 0.6938666 0.69877579 0.72349713] mean value: 0.7049743847733497 key: test_accuracy value: [0.75555556 0.73333333 0.75555556 0.71111111 0.82222222 0.75555556 0.68888889 0.68888889 0.75555556 0.8 ] mean value: 0.7466666666666667 key: train_accuracy value: [0.84938272 0.85679012 0.84197531 0.8617284 0.84444444 0.85679012 0.85432099 0.84691358 0.84938272 0.8617284 ] mean value: 0.8523456790123457 key: test_fscore value: [0.74418605 0.7 0.75555556 0.75471698 0.81818182 0.78431373 0.66666667 0.66666667 0.66666667 0.79069767] mean value: 0.7347651801289877 key: train_fscore value: [0.84863524 0.85427136 0.84236453 0.86 0.84050633 0.85353535 0.85138539 0.84653465 0.84938272 0.86138614] mean value: 0.8508001705741715 key: test_precision value: [0.8 0.82352941 0.77272727 0.66666667 0.85714286 0.68965517 0.7 0.7 1. 0.80952381] mean value: 0.7819245190239105 key: train_precision value: [0.85074627 0.86734694 0.83823529 0.86868687 0.86010363 0.87564767 0.87113402 0.85074627 0.85148515 0.86567164] mean value: 0.8599803745154699 key: test_recall value: [0.69565217 0.60869565 0.73913043 0.86956522 0.7826087 0.90909091 0.63636364 0.63636364 0.5 0.77272727] mean value: 0.7150197628458498 key: train_recall value: [0.84653465 0.84158416 0.84653465 0.85148515 0.82178218 0.83251232 0.83251232 0.84236453 0.84729064 0.85714286] mean value: 0.841974345217773 key: test_roc_auc value: [0.756917 0.73616601 0.75592885 0.70750988 0.82312253 0.75889328 0.68774704 0.68774704 0.75 0.79940711] mean value: 0.7463438735177865 key: train_roc_auc value: [0.8493757 0.85675267 0.84198654 0.86170317 0.84438863 0.85685022 0.85437497 0.84692484 0.84938789 0.86173975] mean value: 0.8523484368141248 key: test_jcc value: [0.59259259 0.53846154 0.60714286 0.60606061 0.69230769 0.64516129 0.5 0.5 0.5 0.65384615] mean value: 0.5835572730734021 key: train_jcc value: [0.73706897 0.74561404 0.72765957 0.75438596 0.72489083 0.74449339 0.74122807 0.73390558 0.73819742 0.75652174] mean value: 0.7403965575347853 MCC on Blind test: 0.41 Accuracy on Blind test: 0.71 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.0228219 0.02206445 0.02394438 0.02210402 0.02183819 0.02219343 0.02140975 0.01981401 0.02201772 0.0217278 ] mean value: 0.021993565559387206 key: score_time value: [0.01345062 0.01252699 0.01257086 0.01266456 0.01265097 0.01267982 0.01264071 0.01242614 0.01275897 0.01246643] mean value: 0.012683606147766114 key: test_mcc value: [0.73663511 0.68972332 0.86732843 0.73559956 0.86758893 0.82574419 0.77865613 0.68911026 0.95652174 0.73320158] mean value: 0.7880109260079755 key: train_mcc value: [0.79762457 0.82239025 0.80251189 0.80246793 0.80766419 0.81237958 0.80741373 0.82237294 0.80261491 0.80741373] mean value: 0.8084853722573764 key: test_accuracy value: [0.86666667 0.84444444 0.93333333 0.86666667 0.93333333 0.91111111 0.88888889 0.84444444 0.97777778 0.86666667] mean value: 0.8933333333333333 key: train_accuracy value: [0.89876543 0.91111111 0.90123457 0.90123457 0.9037037 0.90617284 0.9037037 0.91111111 0.90123457 0.9037037 ] mean value: 0.9041975308641975 key: test_fscore value: [0.86363636 0.84444444 0.93617021 0.875 0.93333333 0.91304348 0.88888889 0.8372093 0.97777778 0.86363636] mean value: 0.893314016506958 key: train_fscore value: [0.8992629 0.91176471 0.90147783 0.9009901 0.90464548 0.90686275 0.9041769 0.91219512 0.90243902 0.9041769 ] mean value: 0.9047991713233395 key: test_precision value: [0.9047619 0.86363636 0.91666667 0.84 0.95454545 0.875 0.86956522 0.85714286 0.95652174 0.86363636] mean value: 0.8901476566911349 key: train_precision value: [0.89268293 0.90291262 0.89705882 0.9009901 0.89371981 0.90243902 0.90196078 0.90338164 0.89371981 0.90196078] mean value: 0.8990826319784146 key: test_recall value: [0.82608696 0.82608696 0.95652174 0.91304348 0.91304348 0.95454545 0.90909091 0.81818182 1. 0.86363636] mean value: 0.8980237154150198 key: train_recall value: [0.90594059 0.92079208 0.90594059 0.9009901 0.91584158 0.91133005 0.90640394 0.92118227 0.91133005 0.90640394] mean value: 0.9106155196800468 key: test_roc_auc value: [0.86758893 0.84486166 0.93280632 0.86561265 0.93379447 0.91205534 0.88932806 0.84387352 0.97826087 0.86660079] mean value: 0.8934782608695653 key: train_roc_auc value: [0.8987831 0.91113496 0.90124616 0.90123397 0.9037336 0.90616007 0.90369702 0.91108618 0.90120958 0.90369702] mean value: 0.9041981661220309 key: test_jcc value: [0.76 0.73076923 0.88 0.77777778 0.875 0.84 0.8 0.72 0.95652174 0.76 ] mean value: 0.8100068747677444 key: train_jcc value: [0.81696429 0.83783784 0.8206278 0.81981982 0.82589286 0.82959641 0.82511211 0.83856502 0.82222222 0.82511211] mean value: 0.8261750475651821 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.65939665 1.75964832 0.57226229 0.91595173 0.58495402 0.47136617 0.58204484 0.84763002 0.33321142 0.77313566] mean value: 0.7499601125717164 key: score_time value: [0.01269627 0.01352549 0.01349235 0.01917243 0.01356983 0.01362419 0.01330686 0.01411915 0.01365542 0.01274872] mean value: 0.013991069793701173 key: test_mcc value: [0.73663511 0.670374 0.86732843 0.82506438 0.86758893 0.82574419 0.77821935 0.73559956 1. 0.74410286] mean value: 0.805065681579282 key: train_mcc value: [0.83730123 0.85146676 0.82742221 0.8520244 0.83086317 0.81729057 0.83012449 0.82799641 0.7927359 0.81956701] mean value: 0.8286792157204221 key: test_accuracy value: [0.86666667 0.82222222 0.93333333 0.91111111 0.93333333 0.91111111 0.88888889 0.86666667 1. 0.86666667] mean value: 0.9 key: train_accuracy value: [0.91851852 0.92345679 0.91358025 0.92592593 0.91358025 0.90864198 0.91358025 0.91358025 0.8962963 0.90864198] mean value: 0.9135802469135802 key: test_fscore value: [0.86363636 0.8 0.93617021 0.91666667 0.93333333 0.91304348 0.88372093 0.85714286 1. 0.85 ] mean value: 0.8953713842038605 key: train_fscore value: [0.9193154 0.91906005 0.91442543 0.92647059 0.91725768 0.90909091 0.91002571 0.91183879 0.89756098 0.90537084] mean value: 0.9130416381528887 key: test_precision value: [0.9047619 0.94117647 0.91666667 0.88 0.95454545 0.875 0.9047619 0.9 1. 0.94444444] mean value: 0.922135684576861 key: train_precision value: [0.90821256 0.97237569 0.90338164 0.91747573 0.87782805 0.90686275 0.9516129 0.93298969 0.88888889 0.94148936] mean value: 0.920111726559678 key: test_recall value: [0.82608696 0.69565217 0.95652174 0.95652174 0.91304348 0.95454545 0.86363636 0.81818182 1. 0.77272727] mean value: 0.8756916996047431 key: train_recall value: [0.93069307 0.87128713 0.92574257 0.93564356 0.96039604 0.91133005 0.87192118 0.89162562 0.90640394 0.87192118] mean value: 0.9076964346680974 key: test_roc_auc value: [0.86758893 0.82509881 0.93280632 0.91007905 0.93379447 0.91205534 0.88833992 0.86561265 1. 0.86462451] mean value: 0.9 key: train_roc_auc value: [0.91854851 0.92332829 0.9136102 0.92594986 0.91369556 0.90863532 0.91368336 0.91363459 0.89627128 0.90873287] mean value: 0.9136089840511145 key: test_jcc value: [0.76 0.66666667 0.88 0.84615385 0.875 0.84 0.79166667 0.75 1. 0.73913043] mean value: 0.8148617614269789 key: train_jcc value: [0.85067873 0.85024155 0.84234234 0.8630137 0.84716157 0.83333333 0.83490566 0.83796296 0.81415929 0.8271028 ] mean value: 0.8400901944397646 MCC on Blind test: 0.73 Accuracy on Blind test: 0.87 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02700424 0.02184582 0.02058792 0.02255273 0.01992631 0.01929307 0.0199616 0.0216949 0.02253938 0.02359295] mean value: 0.021899890899658204 key: score_time value: [0.01251793 0.00972891 0.00979257 0.00993633 0.00901151 0.00906038 0.00911379 0.00917554 0.00938892 0.00946093] mean value: 0.009718680381774902 key: test_mcc value: [0.77865613 0.91106719 0.82506438 0.95643752 0.82213439 0.91485328 0.95652174 0.77821935 0.95643752 0.91452919] mean value: 0.8813920675315654 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.88888889 0.95555556 0.91111111 0.97777778 0.91111111 0.95555556 0.97777778 0.88888889 0.97777778 0.95555556] mean value: 0.94 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.88888889 0.95652174 0.91666667 0.9787234 0.91304348 0.95652174 0.97777778 0.88372093 0.97674419 0.95238095] mean value: 0.9400989762770414 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.90909091 0.95652174 0.88 0.95833333 0.91304348 0.91666667 0.95652174 0.9047619 1. 1. ] mean value: 0.9394939770374553 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.86956522 0.95652174 0.95652174 1. 0.91304348 1. 1. 0.86363636 0.95454545 0.90909091] mean value: 0.9422924901185771 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.88932806 0.9555336 0.91007905 0.97727273 0.91106719 0.95652174 0.97826087 0.88833992 0.97727273 0.95454545] mean value: 0.9398221343873517 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.8 0.91666667 0.84615385 0.95833333 0.84 0.91666667 0.95652174 0.79166667 0.95454545 0.90909091] mean value: 0.8889645282253977 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.91 Accuracy on Blind test: 0.96 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.12013102 0.11766171 0.11774349 0.11807728 0.11954403 0.11755204 0.13109112 0.13012242 0.13104153 0.13023996] mean value: 0.12332046031951904 key: score_time value: [0.01799774 0.0182426 0.01839256 0.01811171 0.01826453 0.02004719 0.02000332 0.01999044 0.01991343 0.01994133] mean value: 0.019090485572814942 key: test_mcc value: [0.82574419 0.64426877 0.91106719 0.78405645 0.78530224 0.8360602 0.73663511 0.64426877 0.91106719 0.82213439] mean value: 0.7900604515356031 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91111111 0.82222222 0.95555556 0.88888889 0.88888889 0.91111111 0.86666667 0.82222222 0.95555556 0.91111111] mean value: 0.8933333333333333 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90909091 0.82608696 0.95652174 0.89795918 0.88372093 0.91666667 0.86956522 0.81818182 0.95454545 0.90909091] mean value: 0.8941429784525263 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95238095 0.82608696 0.95652174 0.84615385 0.95 0.84615385 0.83333333 0.81818182 0.95454545 0.90909091] mean value: 0.8892448855492334 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.86956522 0.82608696 0.95652174 0.95652174 0.82608696 1. 0.90909091 0.81818182 0.95454545 0.90909091] mean value: 0.9025691699604743 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91205534 0.82213439 0.9555336 0.88735178 0.89031621 0.91304348 0.86758893 0.82213439 0.9555336 0.91106719] mean value: 0.8936758893280633 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83333333 0.7037037 0.91666667 0.81481481 0.79166667 0.84615385 0.76923077 0.69230769 0.91304348 0.83333333] mean value: 0.8114254304471695 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.8 Accuracy on Blind test: 0.9 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01040983 0.01041746 0.01033044 0.01143312 0.01089478 0.01075268 0.01148319 0.01071334 0.01060081 0.01162791] mean value: 0.010866355895996094 key: score_time value: [0.00902987 0.00913858 0.00908208 0.00915885 0.00940609 0.00922942 0.00991488 0.0096333 0.00902796 0.00981998] mean value: 0.009344100952148438 key: test_mcc value: [0.46930785 0.51185771 0.82506438 0.60000118 0.43557241 0.37774032 0.24655092 0.60000118 0.33824342 0.56604076] mean value: 0.4970380107838396 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73333333 0.75555556 0.91111111 0.8 0.71111111 0.68888889 0.62222222 0.8 0.66666667 0.77777778] mean value: 0.7466666666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.72727273 0.75555556 0.91666667 0.80851064 0.68292683 0.66666667 0.56410256 0.79069767 0.61538462 0.79166667] mean value: 0.7319450604300232 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.76190476 0.77272727 0.88 0.79166667 0.77777778 0.7 0.64705882 0.80952381 0.70588235 0.73076923] mean value: 0.7577310695840107 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.69565217 0.73913043 0.95652174 0.82608696 0.60869565 0.63636364 0.5 0.77272727 0.54545455 0.86363636] mean value: 0.7144268774703557 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.73418972 0.75592885 0.91007905 0.79940711 0.71343874 0.68774704 0.61956522 0.79940711 0.66403162 0.77964427] mean value: 0.7463438735177865 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.57142857 0.60714286 0.84615385 0.67857143 0.51851852 0.5 0.39285714 0.65384615 0.44444444 0.65517241] mean value: 0.5868135376756066 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.58 Accuracy on Blind test: 0.79 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.74629807 1.81791091 1.8328793 1.83209157 1.79719567 1.78587461 1.83592582 1.77011418 1.84957242 1.9802525 ] mean value: 1.8248115062713623 key: score_time value: [0.09639883 0.09250951 0.13806105 0.10098362 0.12009382 0.10071588 0.10475492 0.10313916 0.1076386 0.10528183] mean value: 0.10695772171020508 key: test_mcc value: [0.86758893 0.91106719 0.86732843 0.95643752 0.82574419 0.95652174 0.82213439 0.77821935 1. 0.95643752] mean value: 0.894147926437764 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93333333 0.95555556 0.93333333 0.97777778 0.91111111 0.97777778 0.91111111 0.88888889 1. 0.97777778] mean value: 0.9466666666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93333333 0.95652174 0.93617021 0.9787234 0.90909091 0.97777778 0.90909091 0.88372093 1. 0.97674419] mean value: 0.946117340172371 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95454545 0.95652174 0.91666667 0.95833333 0.95238095 0.95652174 0.90909091 0.9047619 1. 1. ] mean value: 0.950882269904009 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.91304348 0.95652174 0.95652174 1. 0.86956522 1. 0.90909091 0.86363636 1. 0.95454545] mean value: 0.9422924901185771 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93379447 0.9555336 0.93280632 0.97727273 0.91205534 0.97826087 0.91106719 0.88833992 1. 0.97727273] mean value: 0.9466403162055336 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.875 0.91666667 0.88 0.95833333 0.83333333 0.95652174 0.83333333 0.79166667 1. 0.95454545] mean value: 0.8999400527009223 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.89 Accuracy on Blind test: 0.95 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.19596362 1.18671298 1.15252948 1.52557898 1.15283728 0.99552059 2.06139922 0.94499493 1.02960014 0.97231007] mean value: 1.2217447280883789 key: score_time value: [0.16833639 0.15999269 0.18151951 0.1587429 0.12559843 0.14782834 0.18785882 0.23321199 0.17717195 0.24588728] mean value: 0.17861483097076417 key: test_mcc value: [0.86758893 0.82213439 0.86732843 0.95643752 0.82574419 0.95652174 0.82574419 0.77821935 1. 0.91452919] mean value: 0.8814247933518943 key: train_mcc value: [0.96049359 0.95061698 0.94568955 0.94078482 0.95556639 0.94569087 0.95066455 0.96544324 0.94078771 0.94078771] mean value: 0.9496525422288566 key: test_accuracy value: [0.93333333 0.91111111 0.93333333 0.97777778 0.91111111 0.97777778 0.91111111 0.88888889 1. 0.95555556] mean value: 0.94 key: train_accuracy value: [0.98024691 0.97530864 0.97283951 0.97037037 0.97777778 0.97283951 0.97530864 0.98271605 0.97037037 0.97037037] mean value: 0.9748148148148148 key: test_fscore value: [0.93333333 0.91304348 0.93617021 0.9787234 0.90909091 0.97777778 0.91304348 0.88372093 1. 0.95238095] mean value: 0.9397284476358546 key: train_fscore value: [0.98019802 0.97524752 0.97270471 0.97014925 0.97766749 0.97283951 0.97524752 0.98280098 0.97029703 0.97029703] mean value: 0.9747449079854762 key: test_precision value: [0.95454545 0.91304348 0.91666667 0.95833333 0.95238095 0.95652174 0.875 0.9047619 1. 1. ] mean value: 0.9431253529079616 key: train_precision value: [0.98019802 0.97524752 0.97512438 0.975 0.9800995 0.97524752 0.9800995 0.98039216 0.97512438 0.97512438] mean value: 0.9771657365473159 key: test_recall value: [0.91304348 0.91304348 0.95652174 1. 0.86956522 1. 0.95454545 0.86363636 1. 0.90909091] mean value: 0.9379446640316206 key: train_recall value: [0.98019802 0.97524752 0.97029703 0.96534653 0.97524752 0.97044335 0.97044335 0.98522167 0.96551724 0.96551724] mean value: 0.9723479490806224 key: test_roc_auc value: [0.93379447 0.91106719 0.93280632 0.97727273 0.91205534 0.97826087 0.91205534 0.88833992 1. 0.95454545] mean value: 0.9400197628458498 key: train_roc_auc value: [0.98024679 0.97530849 0.97283324 0.970358 0.97777155 0.97284544 0.97532068 0.98270985 0.97038238 0.97038238] mean value: 0.9748158806028386 key: test_jcc value: [0.875 0.84 0.88 0.95833333 0.83333333 0.95652174 0.84 0.79166667 1. 0.90909091] mean value: 0.8883945981554677 key: train_jcc value: [0.96116505 0.95169082 0.9468599 0.94202899 0.95631068 0.94711538 0.95169082 0.96618357 0.94230769 0.94230769] mean value: 0.9507660603666303 MCC on Blind test: 0.91 Accuracy on Blind test: 0.96 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02524042 0.02612805 0.02330494 0.02426648 0.0341692 0.03412199 0.02433777 0.02824688 0.03730392 0.03393507] mean value: 0.029105472564697265 key: score_time value: [0.0212667 0.03293991 0.02683496 0.03234029 0.0220933 0.02196932 0.02211404 0.02257228 0.02409673 0.02418518] mean value: 0.02504127025604248 key: test_mcc value: [0.74605372 0.4229249 0.68972332 0.69404997 0.78530224 0.78530224 0.55841694 0.64426877 0.73559956 0.60637261] mean value: 0.666801427789491 key: train_mcc value: [0.7385111 0.69500224 0.73847923 0.75849711 0.72839898 0.76296152 0.77288136 0.7777832 0.72358281 0.73337398] mean value: 0.7429471510582903 key: test_accuracy value: [0.86666667 0.71111111 0.84444444 0.84444444 0.88888889 0.88888889 0.77777778 0.82222222 0.86666667 0.8 ] mean value: 0.8311111111111111 key: train_accuracy value: [0.8691358 0.84691358 0.8691358 0.87901235 0.86419753 0.88148148 0.88641975 0.88888889 0.8617284 0.86666667] mean value: 0.871358024691358 key: test_fscore value: [0.85714286 0.71111111 0.84444444 0.85714286 0.88372093 0.89361702 0.7826087 0.81818182 0.85714286 0.80851064] mean value: 0.8313623230625146 key: train_fscore value: [0.87041565 0.84183673 0.86716792 0.88077859 0.86352357 0.8817734 0.88613861 0.88943489 0.86341463 0.86633663] mean value: 0.8710820634544677 key: test_precision value: [0.94736842 0.72727273 0.86363636 0.80769231 0.95 0.84 0.75 0.81818182 0.9 0.76 ] mean value: 0.8364151637835848 key: train_precision value: [0.85990338 0.86842105 0.87817259 0.86602871 0.86567164 0.8817734 0.89054726 0.8872549 0.85507246 0.87064677] mean value: 0.8723492167626019 key: test_recall value: [0.7826087 0.69565217 0.82608696 0.91304348 0.82608696 0.95454545 0.81818182 0.81818182 0.81818182 0.86363636] mean value: 0.8316205533596838 key: train_recall value: [0.88118812 0.81683168 0.85643564 0.8960396 0.86138614 0.8817734 0.8817734 0.89162562 0.87192118 0.86206897] mean value: 0.8701043749695166 key: test_roc_auc value: [0.86857708 0.71146245 0.84486166 0.84288538 0.89031621 0.89031621 0.77865613 0.82213439 0.86561265 0.8013834 ] mean value: 0.8316205533596839 key: train_roc_auc value: [0.86916549 0.84683949 0.86910452 0.87905428 0.86419061 0.88148076 0.88643125 0.88888211 0.86170317 0.86667805] mean value: 0.8713529727356972 key: test_jcc value: [0.75 0.55172414 0.73076923 0.75 0.79166667 0.80769231 0.64285714 0.69230769 0.75 0.67857143] mean value: 0.7145588606795503 key: train_jcc value: [0.77056277 0.72687225 0.76548673 0.78695652 0.75982533 0.78854626 0.79555556 0.80088496 0.75965665 0.76419214] mean value: 0.7718539151085453 MCC on Blind test: 0.73 Accuracy on Blind test: 0.87 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [5.04259777 5.1556654 4.94334316 4.85878825 4.41032028 4.46716976 1.56981826 3.87288809 5.31235576 4.7354219 ] mean value: 4.436836862564087 key: score_time value: [0.02622247 0.01853395 0.02413154 0.02783108 0.02140975 0.02586436 0.0147388 0.01681423 0.02183342 0.02106977] mean value: 0.021844935417175294 key: test_mcc value: [0.82213439 0.91106719 0.95643752 1. 0.86758893 0.91485328 0.95652174 0.77821935 1. 0.95643752] mean value: 0.9163259916823262 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91111111 0.95555556 0.97777778 1. 0.93333333 0.95555556 0.97777778 0.88888889 1. 0.97777778] mean value: 0.9577777777777777 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.91304348 0.95652174 0.9787234 1. 0.93333333 0.95652174 0.97777778 0.88372093 1. 0.97674419] mean value: 0.9576386588167239 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.91304348 0.95652174 0.95833333 1. 0.95454545 0.91666667 0.95652174 0.9047619 1. 1. ] mean value: 0.9560394315829098 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.91304348 0.95652174 1. 1. 0.91304348 1. 1. 0.86363636 1. 0.95454545] mean value: 0.9600790513833992 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91106719 0.9555336 0.97727273 1. 0.93379447 0.95652174 0.97826087 0.88833992 1. 0.97727273] mean value: 0.957806324110672 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.84 0.91666667 0.95833333 1. 0.875 0.91666667 0.95652174 0.79166667 1. 0.95454545] mean value: 0.9209400527009223 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.95 Accuracy on Blind test: 0.97 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.0814774 0.10963106 0.09837937 0.11255074 0.09978104 0.06058836 0.06621599 0.05549955 0.05097103 0.07600141] mean value: 0.08110959529876709 key: score_time value: [0.03861284 0.04575562 0.03341079 0.03420973 0.02950549 0.01280618 0.02309275 0.01282167 0.01282978 0.02283335] mean value: 0.026587820053100585 key: test_mcc value: [0.86758893 0.69404997 0.82213439 0.69404997 0.82213439 0.64752602 0.73663511 0.69404997 0.82213439 0.73559956] mean value: 0.7535902709223832 key: train_mcc value: [0.92103402 0.91129269 0.91111057 0.92103017 0.93581427 0.92117074 0.91605902 0.93126766 0.92602981 0.89630533] mean value: 0.9191114276703818 key: test_accuracy value: [0.93333333 0.84444444 0.91111111 0.84444444 0.91111111 0.82222222 0.86666667 0.84444444 0.91111111 0.86666667] mean value: 0.8755555555555555 key: train_accuracy value: [0.96049383 0.95555556 0.95555556 0.96049383 0.96790123 0.96049383 0.95802469 0.9654321 0.96296296 0.94814815] mean value: 0.9595061728395062 key: test_fscore value: [0.93333333 0.85714286 0.91304348 0.85714286 0.91304348 0.82608696 0.86956522 0.82926829 0.90909091 0.85714286] mean value: 0.8764860236970523 key: train_fscore value: [0.96059113 0.95588235 0.95544554 0.960199 0.96790123 0.960199 0.95823096 0.96601942 0.96277916 0.94840295] mean value: 0.9595650755455886 key: test_precision value: [0.95454545 0.80769231 0.91304348 0.80769231 0.91304348 0.79166667 0.83333333 0.89473684 0.90909091 0.9 ] mean value: 0.8724844777647981 key: train_precision value: [0.95588235 0.94660194 0.95544554 0.965 0.96551724 0.96984925 0.95588235 0.95215311 0.97 0.94607843] mean value: 0.9582410221215243 key: test_recall value: [0.91304348 0.91304348 0.91304348 0.91304348 0.91304348 0.86363636 0.90909091 0.77272727 0.90909091 0.81818182] mean value: 0.8837944664031621 key: train_recall value: [0.96534653 0.96534653 0.95544554 0.95544554 0.97029703 0.95073892 0.96059113 0.98029557 0.95566502 0.95073892] mean value: 0.9609910744769058 key: test_roc_auc value: [0.93379447 0.84288538 0.91106719 0.84288538 0.91106719 0.82312253 0.86758893 0.84288538 0.91106719 0.86561265] mean value: 0.875197628458498 key: train_roc_auc value: [0.96050578 0.95557967 0.95555528 0.96048139 0.96790714 0.96051797 0.95801834 0.96539531 0.96298103 0.94814174] mean value: 0.9595083646295663 key: test_jcc value: [0.875 0.75 0.84 0.75 0.84 0.7037037 0.76923077 0.70833333 0.83333333 0.75 ] mean value: 0.781960113960114 key: train_jcc value: [0.92417062 0.91549296 0.91469194 0.92344498 0.93779904 0.92344498 0.91981132 0.9342723 0.92822967 0.90186916] mean value: 0.9223226957377971 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02436852 0.01083708 0.01080036 0.01135302 0.01035428 0.01042557 0.01047254 0.01276612 0.01002574 0.01004267] mean value: 0.0121445894241333 key: score_time value: [0.01228738 0.009691 0.00953341 0.00930977 0.00900745 0.00928497 0.00902605 0.01075983 0.00912833 0.00952101] mean value: 0.00975492000579834 key: test_mcc value: [0.78530224 0.46640316 0.82213439 0.82213439 0.78530224 0.77865613 0.55841694 0.64752602 0.79670588 0.60000118] mean value: 0.7062582554009066 key: train_mcc value: [0.72859901 0.7001606 0.67485592 0.74815266 0.71871879 0.75811526 0.71448494 0.76814813 0.73836061 0.71961678] mean value: 0.726921270950553 key: test_accuracy value: [0.88888889 0.73333333 0.91111111 0.91111111 0.88888889 0.88888889 0.77777778 0.82222222 0.88888889 0.8 ] mean value: 0.851111111111111 key: train_accuracy value: [0.86419753 0.84938272 0.83703704 0.87407407 0.85925926 0.87901235 0.85679012 0.88395062 0.8691358 0.85925926] mean value: 0.8632098765432099 key: test_fscore value: [0.88372093 0.73913043 0.91304348 0.91304348 0.88372093 0.88888889 0.7826087 0.82608696 0.87179487 0.79069767] mean value: 0.8492736339045742 key: train_fscore value: [0.86215539 0.84398977 0.83248731 0.87344913 0.85714286 0.87841191 0.85353535 0.88279302 0.86848635 0.8556962 ] mean value: 0.8608147293143978 key: test_precision value: [0.95 0.73913043 0.91304348 0.91304348 0.95 0.86956522 0.75 0.79166667 1. 0.80952381] mean value: 0.8685973084886128 key: train_precision value: [0.87309645 0.87301587 0.85416667 0.87562189 0.8680203 0.885 0.87564767 0.89393939 0.875 0.88020833] mean value: 0.8753716577165349 key: test_recall value: [0.82608696 0.73913043 0.91304348 0.91304348 0.82608696 0.90909091 0.81818182 0.86363636 0.77272727 0.77272727] mean value: 0.8353754940711462 key: train_recall value: [0.85148515 0.81683168 0.81188119 0.87128713 0.84653465 0.87192118 0.83251232 0.87192118 0.86206897 0.83251232] mean value: 0.8468955762571331 key: test_roc_auc value: [0.89031621 0.73320158 0.91106719 0.91106719 0.89031621 0.88932806 0.77865613 0.82312253 0.88636364 0.79940711] mean value: 0.8512845849802372 key: train_roc_auc value: [0.86416622 0.84930254 0.83697508 0.87406721 0.85922792 0.8790299 0.85685022 0.88398039 0.86915329 0.85932546] mean value: 0.8632078232453787 key: test_jcc value: [0.79166667 0.5862069 0.84 0.84 0.79166667 0.8 0.64285714 0.7037037 0.77272727 0.65384615] mean value: 0.7422674503019331 key: train_jcc value: [0.75770925 0.7300885 0.71304348 0.7753304 0.75 0.78318584 0.74449339 0.79017857 0.76754386 0.74778761] mean value: 0.7559360895888796 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01528907 0.02077127 0.01836395 0.01844668 0.01875925 0.02194405 0.01897907 0.02040768 0.02031064 0.02082705] mean value: 0.019409871101379393 key: score_time value: [0.00912023 0.01180029 0.01194692 0.01217914 0.01215005 0.01223612 0.01204348 0.01269436 0.01220083 0.01210046] mean value: 0.011847186088562011 key: test_mcc value: [0.78530224 0.64752602 0.86732843 0.73320158 0.59725988 0.78405645 0.70780516 0.64752602 0.82213439 0.70501339] mean value: 0.7297153566397614 key: train_mcc value: [0.86377146 0.84895551 0.82265468 0.87431362 0.80684222 0.81706101 0.86843671 0.82696893 0.88164702 0.87837337] mean value: 0.8489024517724476 key: test_accuracy value: [0.88888889 0.82222222 0.93333333 0.86666667 0.77777778 0.88888889 0.84444444 0.82222222 0.91111111 0.84444444] mean value: 0.86 key: train_accuracy value: [0.9308642 0.92098765 0.90864198 0.93580247 0.8962963 0.90123457 0.93333333 0.90864198 0.94074074 0.9382716 ] mean value: 0.9214814814814815 key: test_fscore value: [0.88372093 0.81818182 0.93617021 0.86956522 0.73684211 0.87804878 0.85714286 0.82608696 0.90909091 0.82051282] mean value: 0.8535362607590927 key: train_fscore value: [0.92820513 0.91534392 0.91334895 0.93298969 0.8852459 0.89130435 0.93556086 0.91533181 0.94146341 0.93638677] mean value: 0.9195180779922804 key: test_precision value: [0.95 0.85714286 0.91666667 0.86956522 0.93333333 0.94736842 0.77777778 0.79166667 0.90909091 0.94117647] mean value: 0.8893788319710382 key: train_precision value: [0.96276596 0.98295455 0.86666667 0.97311828 0.98780488 0.99393939 0.90740741 0.85470085 0.93236715 0.96842105] mean value: 0.9430146185624383 key: test_recall value: [0.82608696 0.7826087 0.95652174 0.86956522 0.60869565 0.81818182 0.95454545 0.86363636 0.90909091 0.72727273] mean value: 0.8316205533596838 key: train_recall value: [0.8960396 0.85643564 0.96534653 0.8960396 0.8019802 0.80788177 0.96551724 0.98522167 0.95073892 0.90640394] mean value: 0.9031605130956446 key: test_roc_auc value: [0.89031621 0.82312253 0.93280632 0.86660079 0.78162055 0.88735178 0.84683794 0.82312253 0.91106719 0.84189723] mean value: 0.8604743083003953 key: train_roc_auc value: [0.93077842 0.92082866 0.90878164 0.93570453 0.89606399 0.90146564 0.93325367 0.90845242 0.94071599 0.93835049] mean value: 0.9214395454323757 key: test_jcc value: [0.79166667 0.69230769 0.88 0.76923077 0.58333333 0.7826087 0.75 0.7037037 0.83333333 0.69565217] mean value: 0.7481836368140716 key: train_jcc value: [0.86602871 0.84390244 0.84051724 0.87439614 0.79411765 0.80392157 0.87892377 0.84388186 0.88940092 0.88038278] mean value: 0.8515473059624479 MCC on Blind test: 0.75 Accuracy on Blind test: 0.87 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.023592 0.01849461 0.01916718 0.02027512 0.01857185 0.01852775 0.01686931 0.01843333 0.02005339 0.01996136] mean value: 0.019394588470458985 key: score_time value: [0.00936127 0.01204491 0.01203346 0.01205182 0.01208186 0.01245189 0.01205373 0.01204967 0.01202798 0.01207948] mean value: 0.011823606491088868 key: test_mcc value: [0.76206649 0.38361073 0.69583743 0.73320158 0.73320158 0.87476705 0.70780516 0.60637261 0.72645449 0.82574419] mean value: 0.7049061319646861 key: train_mcc value: [0.71591321 0.28354195 0.87676217 0.89347743 0.88642848 0.89639025 0.81462126 0.75056333 0.63579921 0.85520525] mean value: 0.760870254618254 key: test_accuracy value: [0.86666667 0.62222222 0.84444444 0.86666667 0.86666667 0.93333333 0.84444444 0.8 0.84444444 0.91111111] mean value: 0.84 key: train_accuracy value: [0.84197531 0.57530864 0.93580247 0.94567901 0.94320988 0.94814815 0.9037037 0.86419753 0.79012346 0.92592593] mean value: 0.8674074074074074 key: test_fscore value: [0.85 0.4137931 0.8372093 0.86956522 0.86956522 0.93617021 0.85714286 0.80851064 0.8627451 0.91304348] mean value: 0.8217745125063238 key: train_fscore value: [0.81395349 0.25862069 0.93193717 0.94358974 0.94292804 0.94865526 0.90993072 0.87912088 0.82617587 0.92924528] mean value: 0.8384157138013564 key: test_precision value: [1. 1. 0.9 0.86956522 0.86956522 0.88 0.77777778 0.76 0.75862069 0.875 ] mean value: 0.8690528902215559 key: train_precision value: [0.98591549 1. 0.98888889 0.9787234 0.94527363 0.94174757 0.85652174 0.79365079 0.70629371 0.89140271] mean value: 0.9088417944765346 key: test_recall value: [0.73913043 0.26086957 0.7826087 0.86956522 0.86956522 1. 0.95454545 0.86363636 1. 0.95454545] mean value: 0.8294466403162055 key: train_recall value: [0.69306931 0.14851485 0.88118812 0.91089109 0.94059406 0.95566502 0.97044335 0.98522167 0.99507389 0.97044335] mean value: 0.8451104716382969 key: test_roc_auc value: [0.86956522 0.63043478 0.8458498 0.86660079 0.86660079 0.93478261 0.84683794 0.8013834 0.84782609 0.91205534] mean value: 0.842193675889328 key: train_roc_auc value: [0.84160855 0.57425743 0.93566795 0.94559333 0.94320343 0.94812954 0.90353851 0.86389797 0.78961615 0.92581573] mean value: 0.8671328586060576 key: test_jcc value: [0.73913043 0.26086957 0.72 0.76923077 0.76923077 0.88 0.75 0.67857143 0.75862069 0.84 ] mean value: 0.7165653656688139 key: train_jcc value: [0.68627451 0.14851485 0.87254902 0.89320388 0.89201878 0.90232558 0.83474576 0.78431373 0.70383275 0.86784141] mean value: 0.7585620275637062 MCC on Blind test: 0.75 Accuracy on Blind test: 0.88 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.18216228 0.1695621 0.17470765 0.17133498 0.17510653 0.1782279 0.17793107 0.173594 0.16895962 0.1765089 ] mean value: 0.17480950355529784 key: score_time value: [0.01529431 0.01603556 0.01582551 0.01555681 0.01668692 0.01690388 0.01652122 0.0161798 0.01550436 0.01662135] mean value: 0.016112971305847167 key: test_mcc value: [0.86758893 0.82213439 0.95643752 1. 0.86758893 0.87476705 0.95652174 0.86732843 1. 1. ] mean value: 0.9212366993733405 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93333333 0.91111111 0.97777778 1. 0.93333333 0.93333333 0.97777778 0.93333333 1. 1. ] mean value: 0.96 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93333333 0.91304348 0.9787234 1. 0.93333333 0.93617021 0.97777778 0.93023256 1. 1. ] mean value: 0.9602614097866126 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95454545 0.91304348 0.95833333 1. 0.95454545 0.88 0.95652174 0.95238095 1. 1. ] mean value: 0.95693704121965 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.91304348 0.91304348 1. 1. 0.91304348 1. 1. 0.90909091 1. 1. ] mean value: 0.9648221343873518 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93379447 0.91106719 0.97727273 1. 0.93379447 0.93478261 0.97826087 0.93280632 1. 1. ] mean value: 0.9601778656126483 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.875 0.84 0.95833333 1. 0.875 0.88 0.95652174 0.86956522 1. 1. ] mean value: 0.9254420289855072 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.95 Accuracy on Blind test: 0.97 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.05877137 0.06073856 0.05816627 0.06933212 0.05485106 0.0548265 0.07156682 0.08174944 0.06895924 0.07406449] mean value: 0.06530258655548096 key: score_time value: [0.01848102 0.02689838 0.02581835 0.02910113 0.01847243 0.02071047 0.03578973 0.02482224 0.03958559 0.02383661] mean value: 0.02635159492492676 key: test_mcc value: [0.82213439 0.91106719 0.91106719 1. 0.86758893 0.91485328 0.91485328 0.82213439 1. 0.87406293] mean value: 0.9037761587267465 key: train_mcc value: [0.98024679 0.98519693 0.98519693 0.99017145 0.97560447 0.98029413 0.98024679 0.99507389 0.98029509 0.98519729] mean value: 0.9837523754632594 key: test_accuracy value: [0.91111111 0.95555556 0.95555556 1. 0.93333333 0.95555556 0.95555556 0.91111111 1. 0.93333333] mean value: 0.9511111111111111 key: train_accuracy value: [0.99012346 0.99259259 0.99259259 0.99506173 0.98765432 0.99012346 0.99012346 0.99753086 0.99012346 0.99259259] mean value: 0.9918518518518519 key: test_fscore value: [0.91304348 0.95652174 0.95652174 1. 0.93333333 0.95652174 0.95652174 0.90909091 1. 0.92682927] mean value: 0.9508383945499534 key: train_fscore value: [0.99009901 0.99255583 0.99255583 0.99502488 0.98746867 0.99019608 0.99014778 0.99753086 0.99009901 0.99259259] mean value: 0.9918270548106813 key: test_precision value: [0.91304348 0.95652174 0.95652174 1. 0.95454545 0.91666667 0.91666667 0.90909091 1. 1. ] mean value: 0.9523056653491436 key: train_precision value: [0.99009901 0.99502488 0.99502488 1. 1. 0.98536585 0.99014778 1. 0.99502488 0.9950495 ] mean value: 0.9945736778626925 key: test_recall value: [0.91304348 0.95652174 0.95652174 1. 0.91304348 1. 1. 0.90909091 1. 0.86363636] mean value: 0.9511857707509881 key: train_recall value: [0.99009901 0.99009901 0.99009901 0.99009901 0.97524752 0.99507389 0.99014778 0.99507389 0.98522167 0.99014778] mean value: 0.9891308588986978 key: test_roc_auc value: [0.91106719 0.9555336 0.9555336 1. 0.93379447 0.95652174 0.95652174 0.91106719 1. 0.93181818] mean value: 0.9511857707509882 key: train_roc_auc value: [0.9901234 0.99258645 0.99258645 0.9950495 0.98762376 0.9901112 0.9901234 0.99753695 0.99013559 0.99259864] mean value: 0.9918475345071454 key: test_jcc value: [0.84 0.91666667 0.91666667 1. 0.875 0.91666667 0.91666667 0.83333333 1. 0.86363636] mean value: 0.9078636363636363 key: train_jcc value: [0.98039216 0.98522167 0.98522167 0.99009901 0.97524752 0.98058252 0.9804878 0.99507389 0.98039216 0.98529412] mean value: 0.9838012536555218 MCC on Blind test: 0.95 Accuracy on Blind test: 0.97 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.11244583 0.14630103 0.19385266 0.16284895 0.19661665 0.19063306 0.19858932 0.12693048 0.16871762 0.17700362] mean value: 0.16739392280578613 key: score_time value: [0.04026365 0.02383375 0.02751327 0.02340198 0.03419042 0.0237093 0.02365375 0.02253604 0.02923608 0.02369428] mean value: 0.02720324993133545 key: test_mcc value: [0.56604076 0.55841694 0.63358389 0.6133209 0.73663511 0.69156407 0.37747036 0.55533597 0.687125 0.64613475] mean value: 0.6065627735892382 key: train_mcc value: [0.99017145 0.99017145 0.98529269 0.98529269 0.98529269 0.98529376 0.99017193 0.99017193 0.99507389 0.99017193] mean value: 0.9887104432367875 key: test_accuracy value: [0.77777778 0.77777778 0.8 0.8 0.86666667 0.82222222 0.68888889 0.77777778 0.82222222 0.82222222] mean value: 0.7955555555555556 key: train_accuracy value: [0.99506173 0.99506173 0.99259259 0.99259259 0.99259259 0.99259259 0.99506173 0.99506173 0.99753086 0.99506173] mean value: 0.994320987654321 key: test_fscore value: [0.76190476 0.77272727 0.76923077 0.82352941 0.86363636 0.84615385 0.68181818 0.77272727 0.77777778 0.80952381] mean value: 0.7879029467264761 key: train_fscore value: [0.99502488 0.99502488 0.9925187 0.9925187 0.9925187 0.99255583 0.9950495 0.9950495 0.99753086 0.9950495 ] mean value: 0.9942841071283992 key: test_precision value: [0.84210526 0.80952381 0.9375 0.75 0.9047619 0.73333333 0.68181818 0.77272727 1. 0.85 ] mean value: 0.8281769765322397 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.69565217 0.73913043 0.65217391 0.91304348 0.82608696 1. 0.68181818 0.77272727 0.63636364 0.77272727] mean value: 0.7689723320158103 key: train_recall value: [0.99009901 0.99009901 0.98514851 0.98514851 0.98514851 0.98522167 0.99014778 0.99014778 0.99507389 0.99014778] mean value: 0.9886382480612593 key: test_roc_auc value: [0.77964427 0.77865613 0.80335968 0.79743083 0.86758893 0.82608696 0.68873518 0.77766798 0.81818182 0.82114625] mean value: 0.7958498023715415 key: train_roc_auc value: [0.9950495 0.9950495 0.99257426 0.99257426 0.99257426 0.99261084 0.99507389 0.99507389 0.99753695 0.99507389] mean value: 0.9943191240306297 key: test_jcc value: [0.61538462 0.62962963 0.625 0.7 0.76 0.73333333 0.51724138 0.62962963 0.63636364 0.68 ] mean value: 0.652658222365119 key: train_jcc value: [0.99009901 0.99009901 0.98514851 0.98514851 0.98514851 0.98522167 0.99014778 0.99014778 0.99507389 0.99014778] mean value: 0.9886382480612593 MCC on Blind test: 0.59 Accuracy on Blind test: 0.79 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.66729116 0.64972425 0.65732384 0.6665225 0.65949416 0.65429211 0.65473723 0.65853596 0.664325 0.65511894] mean value: 0.6587365150451661 key: score_time value: [0.01024175 0.00957489 0.00951242 0.00985265 0.00966835 0.00935936 0.00952435 0.00940204 0.00994349 0.00998735] mean value: 0.009706664085388183 key: test_mcc value: [0.82213439 0.86732843 0.95643752 1. 0.82574419 0.91485328 0.91485328 0.77821935 1. 0.95643752] mean value: 0.9036007956373757 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91111111 0.93333333 0.97777778 1. 0.91111111 0.95555556 0.95555556 0.88888889 1. 0.97777778] mean value: 0.9511111111111111 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.91304348 0.93617021 0.9787234 1. 0.90909091 0.95652174 0.95652174 0.88372093 1. 0.97674419] mean value: 0.9510536598912994 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.91304348 0.91666667 0.95833333 1. 0.95238095 0.91666667 0.91666667 0.9047619 1. 1. ] mean value: 0.947851966873706 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.91304348 0.95652174 1. 1. 0.86956522 1. 1. 0.86363636 1. 0.95454545] mean value: 0.9557312252964427 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91106719 0.93280632 0.97727273 1. 0.91205534 0.95652174 0.95652174 0.88833992 1. 0.97727273] mean value: 0.9511857707509881 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.84 0.88 0.95833333 1. 0.83333333 0.91666667 0.91666667 0.79166667 1. 0.95454545] mean value: 0.9091212121212121 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.95 Accuracy on Blind test: 0.97 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.06929517 0.06151557 0.07104921 0.03313804 0.04500461 0.03204679 0.03788662 0.03866816 0.03848624 0.04184365] mean value: 0.04689340591430664 key: score_time value: [0.01687455 0.02114534 0.02216387 0.01866746 0.01421213 0.0171454 0.01545548 0.01302433 0.01309681 0.01861858] mean value: 0.017040395736694337 key: test_mcc value: [0.65604724 0.35497208 0.2540839 0.46640316 0.21191154 0.11393242 0.33797818 0.15717365 0.4000988 0.06320859] mean value: 0.3015809563232226 key: train_mcc value: [0.95177249 0.72466772 0.97079432 0.90113034 0.8354634 0.62435788 0.97541464 0.76507358 0.89222145 0.77839025] mean value: 0.8419286071129487 key: test_accuracy value: [0.82222222 0.66666667 0.62222222 0.73333333 0.6 0.55555556 0.66666667 0.57777778 0.66666667 0.53333333] mean value: 0.6444444444444444 key: train_accuracy value: [0.97530864 0.84444444 0.98518519 0.94814815 0.91111111 0.78024691 0.98765432 0.8691358 0.94320988 0.88395062] mean value: 0.9128395061728395 key: test_fscore value: [0.80952381 0.61538462 0.58536585 0.73913043 0.55 0.41176471 0.68085106 0.48648649 0.51612903 0.4 ] mean value: 0.5794636001806261 key: train_fscore value: [0.97461929 0.81524927 0.98492462 0.94516971 0.90217391 0.7192429 0.98777506 0.84985836 0.93994778 0.87399464] mean value: 0.8992955544177024 key: test_precision value: [0.89473684 0.75 0.66666667 0.73913043 0.64705882 0.58333333 0.64 0.6 0.88888889 0.53846154] mean value: 0.694827652776771 key: train_precision value: [1. 1. 1. 1. 1. 1. 0.98058252 1. 1. 0.95882353] mean value: 0.9939406053683609 key: test_recall value: [0.73913043 0.52173913 0.52173913 0.73913043 0.47826087 0.31818182 0.72727273 0.40909091 0.36363636 0.31818182] mean value: 0.5136363636363637 key: train_recall value: [0.95049505 0.68811881 0.97029703 0.8960396 0.82178218 0.56157635 0.99507389 0.73891626 0.88669951 0.80295567] mean value: 0.8311954348144174 key: test_roc_auc value: [0.82411067 0.66996047 0.62450593 0.73320158 0.6027668 0.55039526 0.66798419 0.57411067 0.66007905 0.52865613] mean value: 0.6435770750988142 key: train_roc_auc value: [0.97524752 0.84405941 0.98514851 0.9480198 0.91089109 0.78078818 0.98763596 0.86945813 0.94334975 0.8841511 ] mean value: 0.9128749451299809 key: test_jcc value: [0.68 0.44444444 0.4137931 0.5862069 0.37931034 0.25925926 0.51612903 0.32142857 0.34782609 0.25 ] mean value: 0.41983977391744476 key: train_jcc value: [0.95049505 0.68811881 0.97029703 0.8960396 0.82178218 0.56157635 0.97584541 0.73891626 0.88669951 0.77619048] mean value: 0.8265960678312423 MCC on Blind test: 0.49 Accuracy on Blind test: 0.74 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02726126 0.04403973 0.06417942 0.03625083 0.05212307 0.04398036 0.05202127 0.05687022 0.04713082 0.05117822] mean value: 0.04750351905822754 key: score_time value: [0.02289844 0.02290368 0.03534818 0.02457857 0.02440238 0.01246762 0.02006912 0.02470255 0.02361917 0.02284813] mean value: 0.023383784294128417 key: test_mcc value: [0.77865613 0.77821935 0.86732843 0.73320158 0.82213439 0.77865613 0.73663511 0.68911026 0.95652174 0.73559956] mean value: 0.7876062675297144 key: train_mcc value: [0.86211613 0.8717805 0.87664317 0.86667805 0.89175679 0.87164354 0.871768 0.871768 0.86176621 0.85704185] mean value: 0.8702962246613828 key: test_accuracy value: [0.88888889 0.88888889 0.93333333 0.86666667 0.91111111 0.88888889 0.86666667 0.84444444 0.97777778 0.86666667] mean value: 0.8933333333333333 key: train_accuracy value: [0.9308642 0.93580247 0.9382716 0.93333333 0.94567901 0.93580247 0.93580247 0.93580247 0.9308642 0.92839506] mean value: 0.9350617283950617 key: test_fscore value: [0.88888889 0.89361702 0.93617021 0.86956522 0.91304348 0.88888889 0.86956522 0.8372093 0.97777778 0.85714286] mean value: 0.8931868862110025 key: train_fscore value: [0.93170732 0.93627451 0.93857494 0.93333333 0.94634146 0.93627451 0.93658537 0.93658537 0.93137255 0.92944039] mean value: 0.9356489742025249 key: test_precision value: [0.90909091 0.875 0.91666667 0.86956522 0.91304348 0.86956522 0.83333333 0.85714286 0.95652174 0.9 ] mean value: 0.8899929418407679 key: train_precision value: [0.91826923 0.92718447 0.93170732 0.93103448 0.93269231 0.93170732 0.92753623 0.92753623 0.92682927 0.91826923] mean value: 0.9272766084215948 key: test_recall value: [0.86956522 0.91304348 0.95652174 0.86956522 0.91304348 0.90909091 0.90909091 0.81818182 1. 0.81818182] mean value: 0.8976284584980238 key: train_recall value: [0.94554455 0.94554455 0.94554455 0.93564356 0.96039604 0.9408867 0.94581281 0.94581281 0.93596059 0.9408867 ] mean value: 0.9442032873238062 key: test_roc_auc value: [0.88932806 0.88833992 0.93280632 0.86660079 0.91106719 0.88932806 0.86758893 0.84387352 0.97826087 0.86561265] mean value: 0.8932806324110671 key: train_roc_auc value: [0.93090036 0.93582646 0.93828952 0.93333902 0.94571526 0.93578988 0.93577769 0.93577769 0.93085158 0.92836414] mean value: 0.9350631614885626 key: test_jcc value: [0.8 0.80769231 0.88 0.76923077 0.84 0.8 0.76923077 0.72 0.95652174 0.75 ] mean value: 0.8092675585284281 key: train_jcc value: [0.87214612 0.88018433 0.88425926 0.875 0.89814815 0.88018433 0.88073394 0.88073394 0.87155963 0.86818182] mean value: 0.8791131530840937 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.45930314 0.3597424 0.36897445 0.41965199 0.49196625 0.47425413 0.46620059 0.42436361 0.34532022 0.39895368] mean value: 0.4208730459213257 key: score_time value: [0.02300835 0.02374816 0.02298093 0.02300811 0.0216465 0.02457333 0.01253843 0.0218122 0.02498055 0.02483153] mean value: 0.0223128080368042 key: test_mcc value: [0.77865613 0.77821935 0.86732843 0.73320158 0.82213439 0.77865613 0.73663511 0.64613475 0.95652174 0.73559956] mean value: 0.7833087165986725 key: train_mcc value: [0.86211613 0.8717805 0.87664317 0.86667805 0.93581427 0.87164354 0.80250226 0.92620337 0.81237958 0.85704185] mean value: 0.8682802726942564 key: test_accuracy value: [0.88888889 0.88888889 0.93333333 0.86666667 0.91111111 0.88888889 0.86666667 0.82222222 0.97777778 0.86666667] mean value: 0.8911111111111111 key: train_accuracy value: [0.9308642 0.93580247 0.9382716 0.93333333 0.96790123 0.93580247 0.90123457 0.96296296 0.90617284 0.92839506] mean value: 0.9340740740740741 key: test_fscore value: [0.88888889 0.89361702 0.93617021 0.86956522 0.91304348 0.88888889 0.86956522 0.80952381 0.97777778 0.85714286] mean value: 0.8904183369308254 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:128: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:131: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.93170732 0.93627451 0.93857494 0.93333333 0.96790123 0.93627451 0.90196078 0.96350365 0.90686275 0.92944039] mean value: 0.9345833411498392 key: test_precision value: [0.90909091 0.875 0.91666667 0.86956522 0.91304348 0.86956522 0.83333333 0.85 0.95652174 0.9 ] mean value: 0.8892786561264822 key: train_precision value: [0.91826923 0.92718447 0.93170732 0.93103448 0.96551724 0.93170732 0.89756098 0.95192308 0.90243902 0.91826923] mean value: 0.9275612362765229 key: test_recall value: [0.86956522 0.91304348 0.95652174 0.86956522 0.91304348 0.90909091 0.90909091 0.77272727 1. 0.81818182] mean value: 0.8930830039525691 key: train_recall value: [0.94554455 0.94554455 0.94554455 0.93564356 0.97029703 0.9408867 0.90640394 0.97536946 0.91133005 0.9408867 ] mean value: 0.9417451104716383 key: test_roc_auc value: [0.88932806 0.88833992 0.93280632 0.86660079 0.91106719 0.88932806 0.86758893 0.82114625 0.97826087 0.86561265] mean value: 0.8910079051383399 key: train_roc_auc value: [0.93090036 0.93582646 0.93828952 0.93333902 0.96790714 0.93578988 0.90122177 0.96293225 0.90616007 0.92836414] mean value: 0.9340730624786616 key: test_jcc value: [0.8 0.80769231 0.88 0.76923077 0.84 0.8 0.76923077 0.68 0.95652174 0.75 ] mean value: 0.8052675585284281 key: train_jcc value: [0.87214612 0.88018433 0.88425926 0.875 0.93779904 0.88018433 0.82142857 0.92957746 0.82959641 0.86818182] mean value: 0.8778357351592567 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.04131389 0.09903002 0.04775429 0.06405687 0.17856431 0.06696653 0.1108768 0.04235101 0.0490725 0.04135633] mean value: 0.0741342544555664 key: score_time value: [0.01273394 0.02248192 0.02041554 0.01248813 0.01281857 0.01917338 0.01525044 0.01229382 0.01236773 0.01228189] mean value: 0.015230536460876465 key: test_mcc value: [0.77865613 0.64426877 0.86732843 0.77821935 0.86758893 0.82574419 0.69583743 0.68911026 0.95652174 0.82213439] mean value: 0.7925409622270596 key: train_mcc value: [0.86190245 0.8716498 0.86172755 0.86177295 0.87664317 0.85188889 0.86188899 0.87680228 0.85679795 0.86176621] mean value: 0.8642840257675023 key: test_accuracy value: [0.88888889 0.82222222 0.93333333 0.88888889 0.93333333 0.91111111 0.84444444 0.84444444 0.97777778 0.91111111] mean value: 0.8955555555555555 key: train_accuracy value: [0.9308642 0.93580247 0.9308642 0.9308642 0.9382716 0.92592593 0.9308642 0.9382716 0.92839506 0.9308642 ] mean value: 0.9320987654320988 key: test_fscore value: [0.88888889 0.82608696 0.93617021 0.89361702 0.93333333 0.91304348 0.85106383 0.8372093 0.97777778 0.90909091] mean value: 0.8966281710028886 key: train_fscore value: [0.93137255 0.93596059 0.93069307 0.93103448 0.93857494 0.92647059 0.93170732 0.93917275 0.92874693 0.93137255] mean value: 0.9325105763259832 key: test_precision value: [0.90909091 0.82608696 0.91666667 0.875 0.95454545 0.875 0.8 0.85714286 0.95652174 0.90909091] mean value: 0.887914549218897 key: train_precision value: [0.9223301 0.93137255 0.93069307 0.92647059 0.93170732 0.92195122 0.92270531 0.92788462 0.92647059 0.92682927] mean value: 0.9268414626156831 key: test_recall value: [0.86956522 0.82608696 0.95652174 0.91304348 0.91304348 0.95454545 0.90909091 0.81818182 1. 0.90909091] mean value: 0.9069169960474308 key: train_recall value: [0.94059406 0.94059406 0.93069307 0.93564356 0.94554455 0.93103448 0.9408867 0.95073892 0.93103448 0.93596059] mean value: 0.9382724479344486 key: test_roc_auc value: [0.88932806 0.82213439 0.93280632 0.88833992 0.93379447 0.91205534 0.8458498 0.84387352 0.97826087 0.91106719] mean value: 0.8957509881422925 key: train_roc_auc value: [0.93088816 0.93581427 0.93086378 0.93087597 0.93828952 0.92591328 0.93083939 0.93824075 0.92838853 0.93085158] mean value: 0.9320965224601278 key: test_jcc value: [0.8 0.7037037 0.88 0.80769231 0.875 0.84 0.74074074 0.72 0.95652174 0.83333333] mean value: 0.815699182460052 key: train_jcc value: [0.87155963 0.87962963 0.87037037 0.87096774 0.88425926 0.8630137 0.87214612 0.8853211 0.86697248 0.87155963] mean value: 0.8735799662583039 MCC on Blind test: 0.8 Accuracy on Blind test: 0.9 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [2.01325941 1.78600645 1.50203204 1.71028256 1.41295767 1.62343645 1.50840998 1.89245319 1.44183517 1.61304617] mean value: 1.6503719091415405 key: score_time value: [0.01474905 0.01609755 0.02074885 0.02179193 0.01591682 0.01258969 0.01645231 0.02142692 0.01871753 0.01511955] mean value: 0.017361021041870116 key: test_mcc value: [0.82574419 0.68911026 0.86732843 0.77821935 0.82574419 0.82574419 0.73663511 0.68911026 1. 0.77821935] mean value: 0.8015855339402445 key: train_mcc value: [0.8965753 0.89630533 0.89630786 0.89139819 0.90627515 0.82225691 0.89630533 0.91128405 0.88164702 0.89152603] mean value: 0.8889881173544825 key: test_accuracy value: [0.91111111 0.84444444 0.93333333 0.88888889 0.91111111 0.91111111 0.86666667 0.84444444 1. 0.88888889] mean value: 0.9 key: train_accuracy value: [0.94814815 0.94814815 0.94814815 0.94567901 0.95308642 0.91111111 0.94814815 0.95555556 0.94074074 0.94567901] mean value: 0.9444444444444444 key: test_fscore value: [0.90909091 0.85106383 0.93617021 0.89361702 0.90909091 0.91304348 0.86956522 0.8372093 1. 0.88372093] mean value: 0.9002571810221919 key: train_fscore value: [0.94865526 0.94789082 0.94814815 0.94527363 0.95331695 0.91176471 0.94840295 0.95609756 0.94146341 0.94634146] mean value: 0.9447354902197866 key: test_precision value: [0.95238095 0.83333333 0.91666667 0.875 0.95238095 0.875 0.83333333 0.85714286 1. 0.9047619 ] mean value: 0.9 key: train_precision value: [0.93719807 0.95024876 0.94581281 0.95 0.94634146 0.90731707 0.94607843 0.9468599 0.93236715 0.93719807] mean value: 0.939942172046439 key: test_recall value: [0.86956522 0.86956522 0.95652174 0.91304348 0.86956522 0.95454545 0.90909091 0.81818182 1. 0.86363636] mean value: 0.9023715415019763 key: train_recall value: [0.96039604 0.94554455 0.95049505 0.94059406 0.96039604 0.91625616 0.95073892 0.96551724 0.95073892 0.95566502] mean value: 0.9496341998731893 key: test_roc_auc value: [0.91205534 0.84387352 0.93280632 0.88833992 0.91205534 0.91205534 0.86758893 0.84387352 1. 0.88833992] mean value: 0.900098814229249 key: train_roc_auc value: [0.94817832 0.94814174 0.94815393 0.94566649 0.95310442 0.91109838 0.94814174 0.9555309 0.94071599 0.94565429] mean value: 0.944438618738721 key: test_jcc value: [0.83333333 0.74074074 0.88 0.80769231 0.83333333 0.84 0.76923077 0.72 1. 0.79166667] mean value: 0.8215997150997151 key: train_jcc value: [0.90232558 0.9009434 0.90140845 0.89622642 0.91079812 0.83783784 0.90186916 0.91588785 0.88940092 0.89814815] mean value: 0.8954845882476823 MCC on Blind test: 0.8 Accuracy on Blind test: 0.9 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01473141 0.01323748 0.01226425 0.01224113 0.01230812 0.01203942 0.01149964 0.0120852 0.01192832 0.01190805] mean value: 0.012424302101135255 key: score_time value: [0.01256609 0.0105772 0.01097941 0.01110196 0.01087761 0.01021123 0.0105443 0.01062822 0.01049519 0.01022792] mean value: 0.010820913314819335 key: test_mcc value: [0.77865613 0.60079051 0.60079051 0.65604724 0.70780516 0.73320158 0.42993591 0.60000118 0.59109821 0.64613475] mean value: 0.6344461181857074 key: train_mcc value: [0.67799996 0.63062266 0.64891459 0.66881392 0.69328869 0.66530582 0.64321841 0.71511705 0.67817152 0.66530582] mean value: 0.6686758440784181 key: test_accuracy value: [0.88888889 0.8 0.8 0.82222222 0.84444444 0.86666667 0.71111111 0.8 0.77777778 0.82222222] mean value: 0.8133333333333334 key: train_accuracy value: [0.83703704 0.81234568 0.82222222 0.83209877 0.84444444 0.82962963 0.81234568 0.85432099 0.83703704 0.82962963] mean value: 0.8311111111111111 key: test_fscore value: [0.88888889 0.8 0.8 0.80952381 0.82926829 0.86363636 0.66666667 0.79069767 0.72222222 0.80952381] mean value: 0.7980427727563293 key: train_fscore value: [0.82722513 0.79787234 0.81052632 0.82105263 0.83464567 0.81794195 0.7877095 0.84432718 0.828125 0.81794195] mean value: 0.8187367666976243 key: test_precision value: [0.90909091 0.81818182 0.81818182 0.89473684 0.94444444 0.86363636 0.76470588 0.80952381 0.92857143 0.85 ] mean value: 0.8601073316088796 key: train_precision value: [0.87777778 0.86206897 0.86516854 0.87640449 0.88826816 0.88068182 0.90967742 0.90909091 0.87845304 0.88068182] mean value: 0.8828272936910883 key: test_recall value: [0.86956522 0.7826087 0.7826087 0.73913043 0.73913043 0.86363636 0.59090909 0.77272727 0.59090909 0.77272727] mean value: 0.750395256916996 key: train_recall value: [0.78217822 0.74257426 0.76237624 0.77227723 0.78712871 0.7635468 0.69458128 0.78817734 0.78325123 0.7635468 ] mean value: 0.7639638101741208 key: test_roc_auc value: [0.88932806 0.80039526 0.80039526 0.82411067 0.84683794 0.86660079 0.70849802 0.79940711 0.77371542 0.82114625] mean value: 0.8130434782608695 key: train_roc_auc value: [0.83690192 0.81217383 0.82207482 0.83195142 0.84430327 0.8297932 0.81263718 0.85448471 0.83717017 0.8297932 ] mean value: 0.8311283714578355 key: test_jcc value: [0.8 0.66666667 0.66666667 0.68 0.70833333 0.76 0.5 0.65384615 0.56521739 0.68 ] mean value: 0.6680730211817169 key: train_jcc value: [0.70535714 0.66371681 0.68141593 0.69642857 0.71621622 0.69196429 0.64976959 0.73059361 0.70666667 0.69196429] mean value: 0.6934093104519392 MCC on Blind test: 0.68 Accuracy on Blind test: 0.84 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01203966 0.01195383 0.01233935 0.01260471 0.01238465 0.01232862 0.01189423 0.01254082 0.01260471 0.0126884 ] mean value: 0.012337899208068848 key: score_time value: [0.01046038 0.01066375 0.01018858 0.01057577 0.01045346 0.0106113 0.01064634 0.01088023 0.01084447 0.01107097] mean value: 0.010639524459838868 key: test_mcc value: [0.70780516 0.4229249 0.68972332 0.73559956 0.78530224 0.69583743 0.55841694 0.64426877 0.69404997 0.55841694] mean value: 0.6492345229004394 key: train_mcc value: [0.7284056 0.69072841 0.70964919 0.73836061 0.7234551 0.75324391 0.75343373 0.76814813 0.70374345 0.72863208] mean value: 0.7297800213531322 key: test_accuracy value: [0.84444444 0.71111111 0.84444444 0.86666667 0.88888889 0.84444444 0.77777778 0.82222222 0.84444444 0.77777778] mean value: 0.8222222222222222 key: train_accuracy value: [0.86419753 0.84444444 0.85432099 0.8691358 0.8617284 0.87654321 0.87654321 0.88395062 0.85185185 0.86419753] mean value: 0.8646913580246913 key: test_fscore value: [0.82926829 0.71111111 0.84444444 0.875 0.88372093 0.85106383 0.7826087 0.81818182 0.82926829 0.7826087 ] mean value: 0.8207276110427367 key: train_fscore value: [0.86419753 0.83804627 0.84987277 0.86977887 0.86138614 0.87562189 0.875 0.88279302 0.85148515 0.86284289] mean value: 0.8631024534573951 key: test_precision value: [0.94444444 0.72727273 0.86363636 0.84 0.95 0.8 0.75 0.81818182 0.89473684 0.75 ] mean value: 0.8338272195640617 key: train_precision value: [0.86206897 0.87165775 0.87434555 0.86341463 0.86138614 0.88442211 0.88832487 0.89393939 0.85572139 0.87373737] mean value: 0.8729018186387163 key: test_recall value: [0.73913043 0.69565217 0.82608696 0.91304348 0.82608696 0.90909091 0.81818182 0.81818182 0.77272727 0.81818182] mean value: 0.8136363636363636 key: train_recall value: [0.86633663 0.80693069 0.82673267 0.87623762 0.86138614 0.86699507 0.86206897 0.87192118 0.84729064 0.85221675] mean value: 0.8538116373213676 key: test_roc_auc value: [0.84683794 0.71146245 0.84486166 0.86561265 0.89031621 0.8458498 0.77865613 0.82213439 0.84288538 0.77865613] mean value: 0.8227272727272728 key: train_roc_auc value: [0.8642028 0.84435205 0.85425304 0.86915329 0.86172755 0.87656684 0.87657904 0.88398039 0.85186314 0.86422719] mean value: 0.864690533092718 key: test_jcc value: [0.70833333 0.55172414 0.73076923 0.77777778 0.79166667 0.74074074 0.64285714 0.69230769 0.70833333 0.64285714] mean value: 0.6987367198574095 key: train_jcc value: [0.76086957 0.72123894 0.73893805 0.76956522 0.75652174 0.77876106 0.77777778 0.79017857 0.74137931 0.75877193] mean value: 0.7594002164212214 MCC on Blind test: 0.73 Accuracy on Blind test: 0.87 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01207423 0.01183081 0.01203465 0.01215744 0.01158547 0.01139283 0.01193786 0.01136422 0.01127291 0.01138806] mean value: 0.011703848838806152 key: score_time value: [0.01589179 0.01923251 0.01931143 0.01723266 0.01773953 0.01866388 0.01476097 0.01722598 0.01753187 0.01763558] mean value: 0.017522621154785156 key: test_mcc value: [0.47603428 0.48698902 0.51185771 0.38799274 0.64752602 0.48698902 0.42178301 0.33402405 0.58158 0.60000118] mean value: 0.49347770092446475 key: train_mcc value: [0.69385167 0.70422287 0.66913791 0.71410816 0.68986411 0.70498382 0.70498382 0.6842722 0.68897398 0.70403264] mean value: 0.6958431183577889 key: test_accuracy value: [0.73333333 0.73333333 0.75555556 0.68888889 0.82222222 0.73333333 0.71111111 0.66666667 0.75555556 0.8 ] mean value: 0.74 key: train_accuracy value: [0.84691358 0.85185185 0.8345679 0.85679012 0.84444444 0.85185185 0.85185185 0.84197531 0.84444444 0.85185185] mean value: 0.8476543209876544 key: test_fscore value: [0.71428571 0.7 0.75555556 0.73076923 0.81818182 0.76 0.69767442 0.63414634 0.66666667 0.79069767] mean value: 0.7267977419945656 key: train_fscore value: [0.84577114 0.84848485 0.8337469 0.85353535 0.83969466 0.84771574 0.84771574 0.84 0.84367246 0.85 ] mean value: 0.8450336829707287 key: test_precision value: [0.78947368 0.82352941 0.77272727 0.65517241 0.85714286 0.67857143 0.71428571 0.68421053 1. 0.80952381] mean value: 0.7784637118335207 key: train_precision value: [0.85 0.86597938 0.8358209 0.87113402 0.86387435 0.87434555 0.87434555 0.85279188 0.85 0.86294416] mean value: 0.8601235783219559 key: test_recall value: [0.65217391 0.60869565 0.73913043 0.82608696 0.7826087 0.86363636 0.68181818 0.59090909 0.5 0.77272727] mean value: 0.7017786561264823 key: train_recall value: [0.84158416 0.83168317 0.83168317 0.83663366 0.81683168 0.8226601 0.8226601 0.82758621 0.83743842 0.83743842] mean value: 0.8306199092815685 key: test_roc_auc value: [0.73517787 0.73616601 0.75592885 0.68577075 0.82312253 0.73616601 0.71047431 0.66501976 0.75 0.79940711] mean value: 0.7397233201581027 key: train_roc_auc value: [0.84690045 0.85180218 0.8345608 0.85674048 0.84437643 0.85192411 0.85192411 0.84201093 0.84446179 0.85188753] mean value: 0.8476588791884114 key: test_jcc value: [0.55555556 0.53846154 0.60714286 0.57575758 0.69230769 0.61290323 0.53571429 0.46428571 0.5 0.65384615] mean value: 0.5735974598877824 key: train_jcc value: [0.73275862 0.73684211 0.71489362 0.74449339 0.72368421 0.73568282 0.73568282 0.72413793 0.72961373 0.73913043] mean value: 0.731691968406008 MCC on Blind test: 0.41 Accuracy on Blind test: 0.71 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02174282 0.02407169 0.01865649 0.01932096 0.01913285 0.01908374 0.01905656 0.01899648 0.01942968 0.02316642] mean value: 0.020265769958496094 key: score_time value: [0.01160741 0.01241922 0.01122999 0.01308322 0.01144195 0.01153398 0.01152825 0.01177454 0.01203108 0.01304173] mean value: 0.01196913719177246 key: test_mcc value: [0.73663511 0.68972332 0.86732843 0.77821935 0.86758893 0.82574419 0.77865613 0.68911026 0.95652174 0.73320158] mean value: 0.7922729043083154 key: train_mcc value: [0.79762457 0.81737922 0.80251189 0.80246793 0.80766419 0.81237958 0.80246793 0.81736586 0.79262493 0.80741373] mean value: 0.8059899831841102 key: test_accuracy value: [0.86666667 0.84444444 0.93333333 0.88888889 0.93333333 0.91111111 0.88888889 0.84444444 0.97777778 0.86666667] mean value: 0.8955555555555555 key: train_accuracy value: [0.89876543 0.90864198 0.90123457 0.90123457 0.9037037 0.90617284 0.90123457 0.90864198 0.8962963 0.9037037 ] mean value: 0.902962962962963 key: test_fscore value: [0.86363636 0.84444444 0.93617021 0.89361702 0.93333333 0.91304348 0.88888889 0.8372093 0.97777778 0.86363636] mean value: 0.8951757186346175 key: train_fscore value: [0.8992629 0.90909091 0.90147783 0.9009901 0.90464548 0.90686275 0.90147783 0.90953545 0.89705882 0.9041769 ] mean value: 0.903457897428805 key: test_precision value: [0.9047619 0.86363636 0.91666667 0.875 0.95454545 0.875 0.86956522 0.85714286 0.95652174 0.86363636] mean value: 0.893647656691135 key: train_precision value: [0.89268293 0.90243902 0.89705882 0.9009901 0.89371981 0.90243902 0.90147783 0.90291262 0.89268293 0.90196078] mean value: 0.8988363869926886 key: test_recall value: [0.82608696 0.82608696 0.95652174 0.91304348 0.91304348 0.95454545 0.90909091 0.81818182 1. 0.86363636] mean value: 0.8980237154150198 key: train_recall value: [0.90594059 0.91584158 0.90594059 0.9009901 0.91584158 0.91133005 0.90147783 0.91625616 0.90147783 0.90640394] mean value: 0.9081500268253426 key: test_roc_auc value: [0.86758893 0.84486166 0.93280632 0.88833992 0.93379447 0.91205534 0.88932806 0.84387352 0.97826087 0.86660079] mean value: 0.8957509881422925 key: train_roc_auc value: [0.8987831 0.90865971 0.90124616 0.90123397 0.9037336 0.90616007 0.90123397 0.90862313 0.89628347 0.90369702] mean value: 0.9029654196946788 key: test_jcc value: [0.76 0.73076923 0.88 0.80769231 0.875 0.84 0.8 0.72 0.95652174 0.76 ] mean value: 0.8129983277591973 key: train_jcc value: [0.81696429 0.83333333 0.8206278 0.81981982 0.82589286 0.82959641 0.8206278 0.83408072 0.81333333 0.82511211] mean value: 0.8239388472392957 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.31762528 2.59691238 1.63772154 1.77554345 1.59011173 1.17077398 2.11147809 1.44927478 0.70069718 1.59665656] mean value: 1.5946794986724853 key: score_time value: [0.01862407 0.02381897 0.01357055 0.01248121 0.02450323 0.02156973 0.01986384 0.02018833 0.01420259 0.01296401] mean value: 0.018178653717041016 key: test_mcc value: [0.73663511 0.64426877 0.86732843 0.77821935 0.86758893 0.82574419 0.77821935 0.73559956 1. 0.69404997] mean value: 0.7927653674534292 key: train_mcc value: [0.83247548 0.8716498 0.82799641 0.8520244 0.83086317 0.79284035 0.83012449 0.83313446 0.78773172 0.81956701] mean value: 0.8278407283052743 key: test_accuracy value: [0.86666667 0.82222222 0.93333333 0.88888889 0.93333333 0.91111111 0.88888889 0.86666667 1. 0.84444444] mean value: 0.8955555555555555 key: train_accuracy value: [0.91604938 0.93580247 0.91358025 0.92592593 0.91358025 0.89382716 0.91358025 0.91604938 0.89382716 0.90864198] mean value: 0.9130864197530864 key: test_fscore value: [0.86363636 0.82608696 0.93617021 0.89361702 0.93333333 0.91304348 0.88372093 0.85714286 1. 0.82926829] mean value: 0.89360194458532 key: train_fscore value: [0.91707317 0.93596059 0.91525424 0.92647059 0.91725768 0.88772846 0.91002571 0.91414141 0.89486553 0.90537084] mean value: 0.9124148220877728 key: test_precision value: [0.9047619 0.82608696 0.91666667 0.875 0.95454545 0.875 0.9047619 0.9 1. 0.89473684] mean value: 0.9051559729362934 key: train_precision value: [0.90384615 0.93137255 0.8957346 0.91747573 0.87782805 0.94444444 0.9516129 0.93782383 0.88834951 0.94148936] mean value: 0.9189977140608518 key: test_recall value: [0.82608696 0.82608696 0.95652174 0.91304348 0.91304348 0.95454545 0.86363636 0.81818182 1. 0.77272727] mean value: 0.8843873517786561 key: train_recall value: [0.93069307 0.94059406 0.93564356 0.93564356 0.96039604 0.83743842 0.87192118 0.89162562 0.90147783 0.87192118] mean value: 0.9077354533482905 key: test_roc_auc value: [0.86758893 0.82213439 0.93280632 0.88833992 0.93379447 0.91205534 0.88833992 0.86561265 1. 0.84288538] mean value: 0.8953557312252964 key: train_roc_auc value: [0.91608545 0.93581427 0.91363459 0.92594986 0.91369556 0.89396674 0.91368336 0.91610984 0.89380822 0.90873287] mean value: 0.913148075891333 key: test_jcc value: [0.76 0.7037037 0.88 0.80769231 0.875 0.84 0.79166667 0.75 1. 0.70833333] mean value: 0.8116396011396011 key: train_jcc value: [0.84684685 0.87962963 0.84375 0.8630137 0.84716157 0.79812207 0.83490566 0.84186047 0.80973451 0.8271028 ] mean value: 0.8392127255393006 MCC on Blind test: 0.75 Accuracy on Blind test: 0.88 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.03762817 0.0222249 0.02336216 0.02422976 0.02328444 0.02246857 0.02320361 0.02444267 0.02557874 0.02451849] mean value: 0.025094151496887207 key: score_time value: [0.01054001 0.01044989 0.0104568 0.01041985 0.01052427 0.01045942 0.01032162 0.0105691 0.01049876 0.01033282] mean value: 0.010457253456115723 key: test_mcc value: [0.77865613 0.91106719 0.82506438 0.91106719 0.82213439 0.91485328 0.95652174 0.77821935 0.95643752 0.95643752] mean value: 0.8810458680630867 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.88888889 0.95555556 0.91111111 0.95555556 0.91111111 0.95555556 0.97777778 0.88888889 0.97777778 0.97777778] mean value: 0.94 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.88888889 0.95652174 0.91666667 0.95652174 0.91304348 0.95652174 0.97777778 0.88372093 0.97674419 0.97674419] mean value: 0.9403151331311089 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.90909091 0.95652174 0.88 0.95652174 0.91304348 0.91666667 0.95652174 0.9047619 1. 1. ] mean value: 0.9393128176171655 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.86956522 0.95652174 0.95652174 0.95652174 0.91304348 1. 1. 0.86363636 0.95454545 0.95454545] mean value: 0.9424901185770751 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.88932806 0.9555336 0.91007905 0.9555336 0.91106719 0.95652174 0.97826087 0.88833992 0.97727273 0.97727273] mean value: 0.9399209486166008 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.8 0.91666667 0.84615385 0.91666667 0.84 0.91666667 0.95652174 0.79166667 0.95454545 0.95454545] mean value: 0.8893433161041857 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.91 Accuracy on Blind test: 0.96 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.14324355 0.14098144 0.14159226 0.14177227 0.14036822 0.14066744 0.13828254 0.1379981 0.14093471 0.14011884] mean value: 0.14059593677520751 key: score_time value: [0.02067876 0.02081728 0.02086139 0.02107191 0.02067494 0.0208261 0.01972151 0.02075052 0.02083349 0.02075028] mean value: 0.02069861888885498 key: test_mcc value: [0.82574419 0.73320158 0.86758893 0.68911026 0.82574419 0.78530224 0.78530224 0.64426877 0.86732843 0.82213439] mean value: 0.784572523307409 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91111111 0.86666667 0.93333333 0.84444444 0.91111111 0.88888889 0.88888889 0.82222222 0.93333333 0.91111111] mean value: 0.8911111111111111 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90909091 0.86956522 0.93333333 0.85106383 0.90909091 0.89361702 0.89361702 0.81818182 0.93023256 0.90909091] mean value: 0.8916883526659143 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95238095 0.86956522 0.95454545 0.83333333 0.95238095 0.84 0.84 0.81818182 0.95238095 0.90909091] mean value: 0.8921859589685677 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.86956522 0.86956522 0.91304348 0.86956522 0.86956522 0.95454545 0.95454545 0.81818182 0.90909091 0.90909091] mean value: 0.8936758893280632 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91205534 0.86660079 0.93379447 0.84387352 0.91205534 0.89031621 0.89031621 0.82213439 0.93280632 0.91106719] mean value: 0.8915019762845849 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83333333 0.76923077 0.875 0.74074074 0.83333333 0.80769231 0.80769231 0.69230769 0.86956522 0.83333333] mean value: 0.8062229035055122 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.8 Accuracy on Blind test: 0.9 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01231742 0.01216841 0.0121696 0.01225901 0.01252937 0.01230884 0.01246262 0.01228213 0.01244259 0.01268888] mean value: 0.01236288547515869 key: score_time value: [0.0103507 0.01036882 0.01041508 0.0105629 0.0106926 0.01041341 0.01040411 0.0103581 0.01045394 0.01082134] mean value: 0.010484099388122559 key: test_mcc value: [0.43557241 0.68972332 0.73559956 0.51185771 0.43557241 0.38019877 0.2903816 0.73663511 0.33824342 0.670374 ] mean value: 0.5224158297692548 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.71111111 0.84444444 0.86666667 0.75555556 0.71111111 0.68888889 0.64444444 0.86666667 0.66666667 0.82222222] mean value: 0.7577777777777778 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.68292683 0.84444444 0.875 0.75555556 0.68292683 0.69565217 0.6 0.86956522 0.61538462 0.84 ] mean value: 0.7461455665225548 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.77777778 0.86363636 0.84 0.77272727 0.77777778 0.66666667 0.66666667 0.83333333 0.70588235 0.75 ] mean value: 0.7654468211527035 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.60869565 0.82608696 0.91304348 0.73913043 0.60869565 0.72727273 0.54545455 0.90909091 0.54545455 0.95454545] mean value: 0.7377470355731225 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.71343874 0.84486166 0.86561265 0.75592885 0.71343874 0.68972332 0.64229249 0.86758893 0.66403162 0.82509881] mean value: 0.758201581027668 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.51851852 0.73076923 0.77777778 0.60714286 0.51851852 0.53333333 0.42857143 0.76923077 0.44444444 0.72413793] mean value: 0.6052444809341361 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.55 Accuracy on Blind test: 0.77 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.9608767 1.95501304 2.26225162 1.84889507 2.05728984 1.85792685 4.34983826 2.38245821 2.61548281 2.64022255] mean value: 2.3930254936218263 key: score_time value: [0.10743237 0.22150254 0.09776163 0.13473463 0.10169816 0.17076182 0.25548291 0.19140983 0.15087056 0.12873602] mean value: 0.15603904724121093 key: test_mcc value: [0.86758893 0.91106719 0.86732843 0.95643752 0.82574419 0.95652174 0.86758893 0.77821935 1. 0.95643752] mean value: 0.8986933809832185 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93333333 0.95555556 0.93333333 0.97777778 0.91111111 0.97777778 0.93333333 0.88888889 1. 0.97777778] mean value: 0.9488888888888889 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93333333 0.95652174 0.93617021 0.9787234 0.90909091 0.97777778 0.93333333 0.88372093 1. 0.97674419] mean value: 0.9485415825966135 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95454545 0.95652174 0.91666667 0.95833333 0.95238095 0.95652174 0.91304348 0.9047619 1. 1. ] mean value: 0.9512775268210051 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.91304348 0.95652174 0.95652174 1. 0.86956522 1. 0.95454545 0.86363636 1. 0.95454545] mean value: 0.9468379446640316 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93379447 0.9555336 0.93280632 0.97727273 0.91205534 0.97826087 0.93379447 0.88833992 1. 0.97727273] mean value: 0.9489130434782609 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.875 0.91666667 0.88 0.95833333 0.83333333 0.95652174 0.875 0.79166667 1. 0.95454545] mean value: 0.9041067193675889 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.89 Accuracy on Blind test: 0.95 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.02290845 1.19409299 1.70667768 1.77199078 1.73614359 2.20723319 1.984725 1.91829991 1.85758781 1.00323558] mean value: 1.6402894973754882 key: score_time value: [0.15309381 0.17785645 0.17673326 0.22488499 0.2151401 0.18284273 0.1778214 0.29034138 0.14019394 0.16655993] mean value: 0.19054679870605468 key: test_mcc value: [0.86758893 0.82213439 0.86732843 0.95643752 0.82574419 0.91106719 0.86758893 0.77821935 1. 0.87406293] mean value: 0.8770171874100532 key: train_mcc value: [0.95556748 0.95061698 0.94568955 0.94078482 0.95556639 0.95066455 0.95556748 0.97532008 0.94578446 0.93590713] mean value: 0.951146893201595 key: test_accuracy value: [0.93333333 0.91111111 0.93333333 0.97777778 0.91111111 0.95555556 0.93333333 0.88888889 1. 0.93333333] mean value: 0.9377777777777778 key: train_accuracy value: [0.97777778 0.97530864 0.97283951 0.97037037 0.97777778 0.97530864 0.97777778 0.98765432 0.97283951 0.96790123] mean value: 0.9755555555555555 key: test_fscore value: [0.93333333 0.91304348 0.93617021 0.9787234 0.90909091 0.95454545 0.93333333 0.88372093 1. 0.92682927] mean value: 0.9368790324110416 key: train_fscore value: [0.97777778 0.97524752 0.97270471 0.97014925 0.97766749 0.97524752 0.97777778 0.98771499 0.97270471 0.96774194] mean value: 0.9754733705067631 key: test_precision value: [0.95454545 0.91304348 0.91666667 0.95833333 0.95238095 0.95454545 0.91304348 0.9047619 1. 1. ] mean value: 0.9467320722755506 key: train_precision value: [0.97536946 0.97524752 0.97512438 0.975 0.9800995 0.9800995 0.98019802 0.98529412 0.98 0.975 ] mean value: 0.978143250341417 key: test_recall value: [0.91304348 0.91304348 0.95652174 1. 0.86956522 0.95454545 0.95454545 0.86363636 1. 0.86363636] mean value: 0.9288537549407114 key: train_recall value: [0.98019802 0.97524752 0.97029703 0.96534653 0.97524752 0.97044335 0.97536946 0.99014778 0.96551724 0.96059113] mean value: 0.9728405599180607 key: test_roc_auc value: [0.93379447 0.91106719 0.93280632 0.97727273 0.91205534 0.9555336 0.93379447 0.88833992 1. 0.93181818] mean value: 0.9376482213438735 key: train_roc_auc value: [0.97778374 0.97530849 0.97283324 0.970358 0.97777155 0.97532068 0.97778374 0.98764815 0.97285763 0.96791933] mean value: 0.9755584548602644 key: test_jcc value: [0.875 0.84 0.88 0.95833333 0.83333333 0.91304348 0.875 0.79166667 1. 0.86363636] mean value: 0.8830013175230567 key: train_jcc value: [0.95652174 0.95169082 0.9468599 0.94202899 0.95631068 0.95169082 0.95652174 0.97572816 0.9468599 0.9375 ] mean value: 0.9521712747994935 MCC on Blind test: 0.93 Accuracy on Blind test: 0.96 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02517962 0.0100472 0.01011395 0.01026773 0.00996494 0.01005149 0.01018238 0.00989962 0.00991511 0.00988817] mean value: 0.01155102252960205 key: score_time value: [0.00965428 0.00895143 0.00905299 0.00892019 0.0089519 0.00888443 0.00896573 0.0087378 0.00882101 0.00882554] mean value: 0.00897653102874756 key: test_mcc value: [0.70780516 0.4229249 0.68972332 0.73559956 0.78530224 0.69583743 0.55841694 0.64426877 0.69404997 0.55841694] mean value: 0.6492345229004394 key: train_mcc value: [0.7284056 0.69072841 0.70964919 0.73836061 0.7234551 0.75324391 0.75343373 0.76814813 0.70374345 0.72863208] mean value: 0.7297800213531322 key: test_accuracy value: [0.84444444 0.71111111 0.84444444 0.86666667 0.88888889 0.84444444 0.77777778 0.82222222 0.84444444 0.77777778] mean value: 0.8222222222222222 key: train_accuracy value: [0.86419753 0.84444444 0.85432099 0.8691358 0.8617284 0.87654321 0.87654321 0.88395062 0.85185185 0.86419753] mean value: 0.8646913580246913 key: test_fscore value: [0.82926829 0.71111111 0.84444444 0.875 0.88372093 0.85106383 0.7826087 0.81818182 0.82926829 0.7826087 ] mean value: 0.8207276110427367 key: train_fscore value: [0.86419753 0.83804627 0.84987277 0.86977887 0.86138614 0.87562189 0.875 0.88279302 0.85148515 0.86284289] mean value: 0.8631024534573951 key: test_precision value: [0.94444444 0.72727273 0.86363636 0.84 0.95 0.8 0.75 0.81818182 0.89473684 0.75 ] mean value: 0.8338272195640617 key: train_precision value: [0.86206897 0.87165775 0.87434555 0.86341463 0.86138614 0.88442211 0.88832487 0.89393939 0.85572139 0.87373737] mean value: 0.8729018186387163 key: test_recall value: [0.73913043 0.69565217 0.82608696 0.91304348 0.82608696 0.90909091 0.81818182 0.81818182 0.77272727 0.81818182] mean value: 0.8136363636363636 key: train_recall value: [0.86633663 0.80693069 0.82673267 0.87623762 0.86138614 0.86699507 0.86206897 0.87192118 0.84729064 0.85221675] mean value: 0.8538116373213676 key: test_roc_auc value: [0.84683794 0.71146245 0.84486166 0.86561265 0.89031621 0.8458498 0.77865613 0.82213439 0.84288538 0.77865613] mean value: 0.8227272727272728 key: train_roc_auc value: [0.8642028 0.84435205 0.85425304 0.86915329 0.86172755 0.87656684 0.87657904 0.88398039 0.85186314 0.86422719] mean value: 0.864690533092718 key: test_jcc value: [0.70833333 0.55172414 0.73076923 0.77777778 0.79166667 0.74074074 0.64285714 0.69230769 0.70833333 0.64285714] mean value: 0.6987367198574095 key: train_jcc value: [0.76086957 0.72123894 0.73893805 0.76956522 0.75652174 0.77876106 0.77777778 0.79017857 0.74137931 0.75877193] mean value: 0.7594002164212214 MCC on Blind test: 0.73 Accuracy on Blind test: 0.87 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [1.47456765 1.54497075 1.56331825 1.58203959 1.53839636 1.51352477 1.57174468 1.49755764 1.60622501 1.62361526] mean value: 1.5515959978103637 key: score_time value: [0.01256537 0.0133667 0.01274014 0.01221132 0.01370311 0.01287436 0.01300812 0.01307845 0.01412868 0.01365328] mean value: 0.013132953643798828 key: test_mcc value: [0.82213439 0.91106719 0.95643752 1. 0.86758893 0.91485328 0.95652174 0.77821935 1. 0.95643752] mean value: 0.9163259916823262 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91111111 0.95555556 0.97777778 1. 0.93333333 0.95555556 0.97777778 0.88888889 1. 0.97777778] mean value: 0.9577777777777777 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.91304348 0.95652174 0.9787234 1. 0.93333333 0.95652174 0.97777778 0.88372093 1. 0.97674419] mean value: 0.9576386588167239 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.91304348 0.95652174 0.95833333 1. 0.95454545 0.91666667 0.95652174 0.9047619 1. 1. ] mean value: 0.9560394315829098 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.91304348 0.95652174 1. 1. 0.91304348 1. 1. 0.86363636 1. 0.95454545] mean value: 0.9600790513833992 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91106719 0.9555336 0.97727273 1. 0.93379447 0.95652174 0.97826087 0.88833992 1. 0.97727273] mean value: 0.957806324110672 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.84 0.91666667 0.95833333 1. 0.875 0.91666667 0.95652174 0.79166667 1. 0.95454545] mean value: 0.9209400527009223 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.95 Accuracy on Blind test: 0.97 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.12674332 0.10389161 0.06731343 0.08088517 0.083004 0.08519578 0.10193396 0.09383464 0.2874248 0.05935693] mean value: 0.10895836353302002 key: score_time value: [0.02858162 0.0240407 0.0236733 0.03531265 0.02388859 0.0231142 0.0168426 0.01304436 0.01560354 0.01269364] mean value: 0.02167952060699463 key: test_mcc value: [0.86758893 0.69404997 0.82213439 0.69404997 0.82213439 0.69583743 0.73663511 0.69404997 0.82213439 0.73559956] mean value: 0.7584214113139539 key: train_mcc value: [0.91606106 0.91129269 0.91605902 0.92103017 0.93581427 0.92117074 0.91605902 0.93126766 0.92602981 0.89139819] mean value: 0.9186182632580657 key: test_accuracy value: [0.93333333 0.84444444 0.91111111 0.84444444 0.91111111 0.84444444 0.86666667 0.84444444 0.91111111 0.86666667] mean value: 0.8777777777777778 key: train_accuracy value: [0.95802469 0.95555556 0.95802469 0.96049383 0.96790123 0.96049383 0.95802469 0.9654321 0.96296296 0.94567901] mean value: 0.9592592592592593 key: test_fscore value: [0.93333333 0.85714286 0.91304348 0.85714286 0.91304348 0.85106383 0.86956522 0.82926829 0.90909091 0.85714286] mean value: 0.8789837110236018 key: train_fscore value: [0.95802469 0.95588235 0.95781638 0.960199 0.96790123 0.960199 0.95823096 0.96601942 0.96277916 0.94607843] mean value: 0.9593130629395346 key: test_precision value: [0.95454545 0.80769231 0.91304348 0.80769231 0.91304348 0.8 0.83333333 0.89473684 0.90909091 0.9 ] mean value: 0.8733178110981314 key: train_precision value: [0.95566502 0.94660194 0.960199 0.965 0.96551724 0.96984925 0.95588235 0.95215311 0.97 0.94146341] mean value: 0.9582331336586875 key: test_recall value: [0.91304348 0.91304348 0.91304348 0.91304348 0.91304348 0.90909091 0.90909091 0.77272727 0.90909091 0.81818182] mean value: 0.8883399209486166 key: train_recall value: [0.96039604 0.96534653 0.95544554 0.95544554 0.97029703 0.95073892 0.96059113 0.98029557 0.95566502 0.95073892] mean value: 0.9604960249719553 key: test_roc_auc value: [0.93379447 0.84288538 0.91106719 0.84288538 0.91106719 0.8458498 0.86758893 0.84288538 0.91106719 0.86561265] mean value: 0.8774703557312253 key: train_roc_auc value: [0.95803053 0.95557967 0.95801834 0.96048139 0.96790714 0.96051797 0.95801834 0.96539531 0.96298103 0.94566649] mean value: 0.9592596205433351 key: test_jcc value: [0.875 0.75 0.84 0.75 0.84 0.74074074 0.76923077 0.70833333 0.83333333 0.75 ] mean value: 0.7856638176638177 key: train_jcc value: [0.91943128 0.91549296 0.91904762 0.92344498 0.93779904 0.92344498 0.91981132 0.9342723 0.92822967 0.89767442] mean value: 0.9218648556530884 MCC on Blind test: 0.7 Accuracy on Blind test: 0.85 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02621555 0.01159739 0.01051664 0.01107073 0.01101708 0.01104641 0.01100183 0.01075768 0.01108098 0.01129007] mean value: 0.01255943775177002 key: score_time value: [0.02028179 0.01012492 0.00889969 0.00954199 0.00957394 0.0095036 0.00955319 0.0093534 0.00974369 0.0098424 ] mean value: 0.010641860961914062 key: test_mcc value: [0.74605372 0.51089209 0.82213439 0.82213439 0.78530224 0.82574419 0.55841694 0.64752602 0.83484711 0.64426877] mean value: 0.7197319857368281 key: train_mcc value: [0.72859901 0.7001606 0.6847458 0.74821952 0.72358281 0.75811526 0.72914356 0.77300001 0.73836061 0.73425986] mean value: 0.7318187035926328 key: test_accuracy value: [0.86666667 0.75555556 0.91111111 0.91111111 0.88888889 0.91111111 0.77777778 0.82222222 0.91111111 0.82222222] mean value: 0.8577777777777778 key: train_accuracy value: [0.86419753 0.84938272 0.84197531 0.87407407 0.8617284 0.87901235 0.86419753 0.88641975 0.8691358 0.86666667] mean value: 0.865679012345679 key: test_fscore value: [0.85714286 0.76595745 0.91304348 0.91304348 0.88372093 0.91304348 0.7826087 0.82608696 0.9 0.81818182] mean value: 0.8572829139322266 key: train_fscore value: [0.86215539 0.84398977 0.83756345 0.87281796 0.86 0.87841191 0.86146096 0.88557214 0.86848635 0.86363636] mean value: 0.8634094288327001 key: test_precision value: [0.94736842 0.75 0.91304348 0.91304348 0.95 0.875 0.75 0.79166667 1. 0.81818182] mean value: 0.8708303862422855 key: train_precision value: [0.87309645 0.87301587 0.859375 0.87939698 0.86868687 0.885 0.8814433 0.89447236 0.875 0.88601036] mean value: 0.877549719680029 key: test_recall value: [0.7826087 0.7826087 0.91304348 0.91304348 0.82608696 0.95454545 0.81818182 0.86363636 0.81818182 0.81818182] mean value: 0.8490118577075099 key: train_recall value: [0.85148515 0.81683168 0.81683168 0.86633663 0.85148515 0.87192118 0.84236453 0.87684729 0.86206897 0.84236453] mean value: 0.8498536799492757 key: test_roc_auc value: [0.86857708 0.75494071 0.91106719 0.91106719 0.89031621 0.91205534 0.77865613 0.82312253 0.90909091 0.82213439] mean value: 0.858102766798419 key: train_roc_auc value: [0.86416622 0.84930254 0.84191338 0.87405502 0.86170317 0.8790299 0.86425157 0.88644345 0.86915329 0.86672682] mean value: 0.8656745354338389 key: test_jcc value: [0.75 0.62068966 0.84 0.84 0.79166667 0.84 0.64285714 0.7037037 0.81818182 0.69230769] mean value: 0.7539406678889438 key: train_jcc value: [0.75770925 0.7300885 0.72052402 0.77433628 0.75438596 0.78318584 0.75663717 0.79464286 0.76754386 0.76 ] mean value: 0.7599053737883451 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.0166471 0.02165747 0.01920605 0.01865745 0.01901174 0.02208638 0.01923394 0.02118444 0.03245139 0.02038074] mean value: 0.021051669120788576 key: score_time value: [0.01027346 0.01216531 0.0124557 0.01228547 0.01229858 0.01237655 0.01228023 0.01235175 0.02047372 0.01226282] mean value: 0.012922358512878419 key: test_mcc value: [0.78530224 0.64752602 0.86732843 0.73320158 0.59725988 0.78405645 0.70780516 0.64752602 0.82213439 0.70501339] mean value: 0.7297153566397614 key: train_mcc value: [0.86377146 0.84895551 0.81816266 0.86902982 0.80684222 0.81282858 0.86843671 0.81827627 0.88164702 0.87785481] mean value: 0.8465805044965292 key: test_accuracy value: [0.88888889 0.82222222 0.93333333 0.86666667 0.77777778 0.88888889 0.84444444 0.82222222 0.91111111 0.84444444] mean value: 0.86 key: train_accuracy value: [0.9308642 0.92098765 0.90617284 0.93333333 0.8962963 0.89876543 0.93333333 0.9037037 0.94074074 0.9382716 ] mean value: 0.9202469135802469 key: test_fscore value: [0.88372093 0.81818182 0.93617021 0.86956522 0.73684211 0.87804878 0.85714286 0.82608696 0.90909091 0.82051282] mean value: 0.8535362607590927 key: train_fscore value: [0.92820513 0.91534392 0.91121495 0.93059126 0.8852459 0.88828338 0.93556086 0.91116173 0.94146341 0.93670886] mean value: 0.9183779402635586 key: test_precision value: [0.95 0.85714286 0.91666667 0.86956522 0.93333333 0.94736842 0.77777778 0.79166667 0.90909091 0.94117647] mean value: 0.8893788319710382 key: train_precision value: [0.96276596 0.98295455 0.86283186 0.96791444 0.98780488 0.99390244 0.90740741 0.84745763 0.93236715 0.96354167] mean value: 0.940894796783545 key: test_recall value: [0.82608696 0.7826087 0.95652174 0.86956522 0.60869565 0.81818182 0.95454545 0.86363636 0.90909091 0.72727273] mean value: 0.8316205533596838 key: train_recall value: [0.8960396 0.85643564 0.96534653 0.8960396 0.8019802 0.80295567 0.96551724 0.98522167 0.95073892 0.91133005] mean value: 0.9031605130956446 key: test_roc_auc value: [0.89031621 0.82312253 0.93280632 0.86660079 0.78162055 0.88735178 0.84683794 0.82312253 0.91106719 0.84189723] mean value: 0.8604743083003953 key: train_roc_auc value: [0.93077842 0.92082866 0.90631859 0.93324148 0.89606399 0.89900258 0.93325367 0.90350193 0.94071599 0.93833829] mean value: 0.9202043603375115 key: test_jcc value: [0.79166667 0.69230769 0.88 0.76923077 0.58333333 0.7826087 0.75 0.7037037 0.83333333 0.69565217] mean value: 0.7481836368140716 key: train_jcc value: [0.86602871 0.84390244 0.83690987 0.87019231 0.79411765 0.79901961 0.87892377 0.83682008 0.88940092 0.88095238] mean value: 0.8496267734106784 MCC on Blind test: 0.75 Accuracy on Blind test: 0.87 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.03038692 0.02198148 0.02177858 0.01954556 0.02221322 0.02213001 0.02184844 0.02069044 0.01776695 0.01800418] mean value: 0.021634578704833984 key: score_time value: [0.01426578 0.01328993 0.01486778 0.01319575 0.01274991 0.01249719 0.01248622 0.01217604 0.01213098 0.01217985] mean value: 0.012983942031860351 key: test_mcc value: [0.78530224 0.69404997 0.82213439 0.78405645 0.78530224 0.82213439 0.69583743 0.64752602 0.87406293 0.62869461] mean value: 0.7539100674057231 key: train_mcc value: [0.82016416 0.89684043 0.91614635 0.88695876 0.90644294 0.89949116 0.9023231 0.81395079 0.84022048 0.79853924] mean value: 0.868107740114024 key: test_accuracy value: [0.88888889 0.84444444 0.91111111 0.88888889 0.88888889 0.91111111 0.84444444 0.82222222 0.93333333 0.8 ] mean value: 0.8733333333333333 key: train_accuracy value: [0.90617284 0.94814815 0.95802469 0.94320988 0.95308642 0.94814815 0.95061728 0.90123457 0.91604938 0.89135802] mean value: 0.931604938271605 key: test_fscore value: [0.88372093 0.85714286 0.91304348 0.89795918 0.88372093 0.90909091 0.85106383 0.82608696 0.92682927 0.75675676] mean value: 0.8705415099991635 key: train_fscore value: [0.89893617 0.94890511 0.95760599 0.94403893 0.95238095 0.94601542 0.95192308 0.90909091 0.91005291 0.87978142] mean value: 0.9298730887557013 key: test_precision value: [0.95 0.80769231 0.91304348 0.84615385 0.95 0.90909091 0.8 0.79166667 1. 0.93333333] mean value: 0.8900980541197933 key: train_precision value: [0.97126437 0.93301435 0.96482412 0.92822967 0.96446701 0.98924731 0.92957746 0.84388186 0.98285714 0.98773006] mean value: 0.9495093349997615 key: test_recall value: [0.82608696 0.91304348 0.91304348 0.95652174 0.82608696 0.90909091 0.90909091 0.86363636 0.86363636 0.63636364] mean value: 0.8616600790513834 key: train_recall value: [0.83663366 0.96534653 0.95049505 0.96039604 0.94059406 0.90640394 0.97536946 0.98522167 0.84729064 0.79310345] mean value: 0.916085450909623 key: test_roc_auc value: [0.89031621 0.84288538 0.91106719 0.88735178 0.89031621 0.91106719 0.8458498 0.82312253 0.93181818 0.79644269] mean value: 0.8730237154150198 key: train_roc_auc value: [0.90600156 0.94819051 0.95800615 0.94325221 0.95305565 0.94825148 0.95055602 0.90102668 0.91621958 0.89160123] mean value: 0.9316161049602497 key: test_jcc value: [0.79166667 0.75 0.84 0.81481481 0.79166667 0.83333333 0.74074074 0.7037037 0.86363636 0.60869565] mean value: 0.7738257941736203 key: train_jcc value: [0.81642512 0.90277778 0.91866029 0.89400922 0.90909091 0.89756098 0.90825688 0.83333333 0.83495146 0.78536585] mean value: 0.8700431810959086 MCC on Blind test: 0.84 Accuracy on Blind test: 0.92 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.1846261 0.16588259 0.17944908 0.16845226 0.20947981 0.17672706 0.16525626 0.16993427 0.16802812 0.17575526] mean value: 0.17635908126831054 key: score_time value: [0.0156827 0.01687837 0.01521039 0.01514649 0.02424717 0.01539278 0.01648426 0.01540637 0.01509404 0.02262664] mean value: 0.017216920852661133 key: test_mcc value: [0.82213439 0.86732843 0.95643752 1. 0.86758893 0.91485328 0.91106719 0.73559956 1. 1. ] mean value: 0.9075009309091597 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91111111 0.93333333 0.97777778 1. 0.93333333 0.95555556 0.95555556 0.86666667 1. 1. ] mean value: 0.9533333333333334 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.91304348 0.93617021 0.9787234 1. 0.93333333 0.95652174 0.95454545 0.85714286 1. 1. ] mean value: 0.9529480479434226 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.91304348 0.91666667 0.95833333 1. 0.95454545 0.91666667 0.95454545 0.9 1. 1. ] mean value: 0.9513801054018445 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.91304348 0.95652174 1. 1. 0.91304348 1. 0.95454545 0.81818182 1. 1. ] mean value: 0.9555335968379447 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91106719 0.93280632 0.97727273 1. 0.93379447 0.95652174 0.9555336 0.86561265 1. 1. ] mean value: 0.9532608695652174 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.84 0.88 0.95833333 1. 0.875 0.91666667 0.91304348 0.75 1. 1. ] mean value: 0.9133043478260869 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.93 Accuracy on Blind test: 0.96 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.05411959 0.06604648 0.08949947 0.05077219 0.0609808 0.0644958 0.07106686 0.05174732 0.07121181 0.07347536] mean value: 0.06534156799316407 key: score_time value: [0.02211094 0.03330112 0.0351975 0.02889776 0.02099347 0.02941012 0.02830839 0.02586412 0.02633166 0.02234125] mean value: 0.027275633811950684 key: test_mcc value: [0.82213439 0.91106719 0.91106719 1. 0.86758893 0.91485328 0.91485328 0.82213439 1. 0.87406293] mean value: 0.9037761587267465 key: train_mcc value: [0.98024679 0.98519693 0.98519693 0.99017145 0.97560447 0.98029413 0.98024679 0.99507389 0.98029509 0.98519729] mean value: 0.9837523754632594 key: test_accuracy value: [0.91111111 0.95555556 0.95555556 1. 0.93333333 0.95555556 0.95555556 0.91111111 1. 0.93333333] mean value: 0.9511111111111111 key: train_accuracy value: [0.99012346 0.99259259 0.99259259 0.99506173 0.98765432 0.99012346 0.99012346 0.99753086 0.99012346 0.99259259] mean value: 0.9918518518518519 key: test_fscore value: [0.91304348 0.95652174 0.95652174 1. 0.93333333 0.95652174 0.95652174 0.90909091 1. 0.92682927] mean value: 0.9508383945499534 key: train_fscore value: [0.99009901 0.99255583 0.99255583 0.99502488 0.98746867 0.99019608 0.99014778 0.99753086 0.99009901 0.99259259] mean value: 0.9918270548106813 key: test_precision value: [0.91304348 0.95652174 0.95652174 1. 0.95454545 0.91666667 0.91666667 0.90909091 1. 1. ] mean value: 0.9523056653491436 key: train_precision value: [0.99009901 0.99502488 0.99502488 1. 1. 0.98536585 0.99014778 1. 0.99502488 0.9950495 ] mean value: 0.9945736778626925 key: test_recall value: [0.91304348 0.95652174 0.95652174 1. 0.91304348 1. 1. 0.90909091 1. 0.86363636] mean value: 0.9511857707509881 key: train_recall value: [0.99009901 0.99009901 0.99009901 0.99009901 0.97524752 0.99507389 0.99014778 0.99507389 0.98522167 0.99014778] mean value: 0.9891308588986978 key: test_roc_auc value: [0.91106719 0.9555336 0.9555336 1. 0.93379447 0.95652174 0.95652174 0.91106719 1. 0.93181818] mean value: 0.9511857707509882 key: train_roc_auc value: [0.9901234 0.99258645 0.99258645 0.9950495 0.98762376 0.9901112 0.9901234 0.99753695 0.99013559 0.99259864] mean value: 0.9918475345071454 key: test_jcc value: [0.84 0.91666667 0.91666667 1. 0.875 0.91666667 0.91666667 0.83333333 1. 0.86363636] mean value: 0.9078636363636363 key: train_jcc value: [0.98039216 0.98522167 0.98522167 0.99009901 0.97524752 0.98058252 0.9804878 0.99507389 0.98039216 0.98529412] mean value: 0.9838012536555218 MCC on Blind test: 0.95 Accuracy on Blind test: 0.97 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.10892105 0.16002131 0.14787793 0.20454001 0.19232368 0.16132021 0.145684 0.18226552 0.45024633 0.14779735] mean value: 0.19009974002838134 key: score_time value: [0.01462197 0.01497269 0.02363658 0.03207684 0.01924872 0.03103209 0.02994967 0.02701902 0.04569006 0.01469874] mean value: 0.02529463768005371 key: test_mcc value: [0.670374 0.55841694 0.63358389 0.6133209 0.73663511 0.69156407 0.4229249 0.55533597 0.72299881 0.64613475] mean value: 0.625128933432823 key: train_mcc value: [0.99017145 0.99017145 0.98529269 0.98529269 0.98529269 0.98529376 0.99017193 0.99017193 0.99507389 0.99017193] mean value: 0.9887104432367875 key: test_accuracy value: [0.82222222 0.77777778 0.8 0.8 0.86666667 0.82222222 0.71111111 0.77777778 0.84444444 0.82222222] mean value: 0.8044444444444444 key: train_accuracy value: [0.99506173 0.99506173 0.99259259 0.99259259 0.99259259 0.99259259 0.99506173 0.99506173 0.99753086 0.99506173] mean value: 0.994320987654321 key: test_fscore value: [0.8 0.77272727 0.76923077 0.82352941 0.86363636 0.84615385 0.71111111 0.77272727 0.81081081 0.80952381] mean value: 0.7979450667685961 key: train_fscore value: [0.99502488 0.99502488 0.9925187 0.9925187 0.9925187 0.99255583 0.9950495 0.9950495 0.99753086 0.9950495 ] mean value: 0.9942841071283992 key: test_precision value: [0.94117647 0.80952381 0.9375 0.75 0.9047619 0.73333333 0.69565217 0.77272727 1. 0.85 ] mean value: 0.8394674964847599 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.69565217 0.73913043 0.65217391 0.91304348 0.82608696 1. 0.72727273 0.77272727 0.68181818 0.77272727] mean value: 0.7780632411067193 key: train_recall value: [0.99009901 0.99009901 0.98514851 0.98514851 0.98514851 0.98522167 0.99014778 0.99014778 0.99507389 0.99014778] mean value: 0.9886382480612593 key: test_roc_auc value: [0.82509881 0.77865613 0.80335968 0.79743083 0.86758893 0.82608696 0.71146245 0.77766798 0.84090909 0.82114625] mean value: 0.8049407114624506 key: train_roc_auc value: [0.9950495 0.9950495 0.99257426 0.99257426 0.99257426 0.99261084 0.99507389 0.99507389 0.99753695 0.99507389] mean value: 0.9943191240306297 key: test_jcc value: [0.66666667 0.62962963 0.625 0.7 0.76 0.73333333 0.55172414 0.62962963 0.68181818 0.68 ] mean value: 0.6657801579008475 key: train_jcc value: [0.99009901 0.99009901 0.98514851 0.98514851 0.98514851 0.98522167 0.99014778 0.99014778 0.99507389 0.99014778] mean value: 0.9886382480612593 MCC on Blind test: 0.62 Accuracy on Blind test: 0.81 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.69311619 0.70693564 0.67376351 0.71329546 0.67440891 0.70312452 0.67457604 0.7066412 0.69384813 0.67763829] mean value: 0.691734790802002 key: score_time value: [0.00995064 0.00967097 0.01044416 0.00957108 0.014184 0.01022768 0.00957584 0.0112319 0.01048136 0.01039362] mean value: 0.010573124885559082 key: test_mcc value: [0.82213439 0.82506438 0.95643752 1. 0.82574419 0.91485328 0.91485328 0.82213439 1. 0.95643752] mean value: 0.9037658939474779 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91111111 0.91111111 0.97777778 1. 0.91111111 0.95555556 0.95555556 0.91111111 1. 0.97777778] mean value: 0.9511111111111111 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.91304348 0.91666667 0.9787234 1. 0.90909091 0.95652174 0.95652174 0.90909091 1. 0.97674419] mean value: 0.9516403031672055 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.91304348 0.88 0.95833333 1. 0.95238095 0.91666667 0.91666667 0.90909091 1. 1. ] mean value: 0.9446182006399397 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.91304348 0.95652174 1. 1. 0.86956522 1. 1. 0.90909091 1. 0.95454545] mean value: 0.9602766798418972 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91106719 0.91007905 0.97727273 1. 0.91205534 0.95652174 0.95652174 0.91106719 1. 0.97727273] mean value: 0.9511857707509881 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.84 0.84615385 0.95833333 1. 0.83333333 0.91666667 0.91666667 0.83333333 1. 0.95454545] mean value: 0.9099032634032634 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.96 Accuracy on Blind test: 0.98 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.1388762 0.1777842 0.13639593 0.05101848 0.0506146 0.0657537 0.11698508 0.15806198 0.03034258 0.0545578 ] mean value: 0.09803905487060546 key: score_time value: [0.01342249 0.02337122 0.01860356 0.03219318 0.02325344 0.01318908 0.01368833 0.02427459 0.01500654 0.0172019 ] mean value: 0.019420433044433593 key: test_mcc value: [0.60000118 0.55666994 0.38019877 0.56261436 0.22004311 0.2903816 0.21191154 0.24356483 0.5216284 0.33797818] mean value: 0.3924991910418209 key: train_mcc value: [0.9901234 0.97541644 0.99017145 0.98519693 0.72864068 0.76507358 0.93772687 0.89576137 0.98529376 0.78773172] mean value: 0.9041136193904215 key: test_accuracy value: [0.8 0.77777778 0.68888889 0.77777778 0.6 0.64444444 0.6 0.62222222 0.75555556 0.66666667] mean value: 0.6933333333333334 key: train_accuracy value: [0.99506173 0.98765432 0.99506173 0.99259259 0.84691358 0.8691358 0.96790123 0.94567901 0.99259259 0.89382716] mean value: 0.9486419753086419 key: test_fscore value: [0.80851064 0.79166667 0.68181818 0.8 0.52631579 0.6 0.64 0.60465116 0.71794872 0.68085106] mean value: 0.6851762220825608 key: train_fscore value: [0.9950495 0.98771499 0.99502488 0.99255583 0.81871345 0.84985836 0.96897375 0.94300518 0.99255583 0.89486553] mean value: 0.9438317292087527 key: test_precision value: [0.79166667 0.76 0.71428571 0.74074074 0.66666667 0.66666667 0.57142857 0.61904762 0.82352941 0.64 ] mean value: 0.6994032057267351 key: train_precision value: [0.9950495 0.9804878 1. 0.99502488 1. 1. 0.93981481 0.99453552 1. 0.88834951] mean value: 0.9793262033954039 key: test_recall value: [0.82608696 0.82608696 0.65217391 0.86956522 0.43478261 0.54545455 0.72727273 0.59090909 0.63636364 0.72727273] mean value: 0.6835968379446641 key: train_recall value: [0.9950495 0.9950495 0.99009901 0.99009901 0.69306931 0.73891626 1. 0.89655172 0.98522167 0.90147783] mean value: 0.9185533824318393 key: test_roc_auc value: [0.79940711 0.77667984 0.68972332 0.7756917 0.60375494 0.64229249 0.6027668 0.6215415 0.75296443 0.66798419] mean value: 0.6932806324110672 key: train_roc_auc value: [0.9950617 0.98767254 0.9950495 0.99258645 0.84653465 0.86945813 0.96782178 0.94580061 0.99261084 0.89380822] mean value: 0.9486404428620202 key: test_jcc value: [0.67857143 0.65517241 0.51724138 0.66666667 0.35714286 0.42857143 0.47058824 0.43333333 0.56 0.51612903] mean value: 0.5283416774941345 key: train_jcc value: [0.99014778 0.97572816 0.99009901 0.98522167 0.69306931 0.73891626 0.93981481 0.89215686 0.98522167 0.80973451] mean value: 0.90001100521683 MCC on Blind test: 0.54 Accuracy on Blind test: 0.77 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.05427146 0.05056667 0.02967739 0.04050016 0.04019976 0.04014754 0.04688692 0.03133988 0.03438783 0.03365636] mean value: 0.040163397789001465 key: score_time value: [0.02168655 0.02730417 0.03774905 0.0235827 0.02416897 0.02359438 0.0278523 0.02273226 0.02274179 0.02754211] mean value: 0.02589542865753174 key: test_mcc value: [0.82574419 0.77821935 0.86732843 0.73320158 0.82213439 0.77865613 0.73663511 0.68911026 0.95652174 0.77821935] mean value: 0.7965770525024884 key: train_mcc value: [0.85731376 0.86693826 0.88152664 0.86177295 0.89175679 0.87164354 0.871768 0.871768 0.86176621 0.85221434] mean value: 0.8688468510791374 key: test_accuracy value: [0.91111111 0.88888889 0.93333333 0.86666667 0.91111111 0.88888889 0.86666667 0.84444444 0.97777778 0.88888889] mean value: 0.8977777777777778 key: train_accuracy value: [0.92839506 0.93333333 0.94074074 0.9308642 0.94567901 0.93580247 0.93580247 0.93580247 0.9308642 0.92592593] mean value: 0.934320987654321 key: test_fscore value: [0.90909091 0.89361702 0.93617021 0.86956522 0.91304348 0.88888889 0.86956522 0.8372093 0.97777778 0.88372093] mean value: 0.8978648955401747 key: train_fscore value: [0.92944039 0.93398533 0.9408867 0.93103448 0.94634146 0.93627451 0.93658537 0.93658537 0.93137255 0.92718447] mean value: 0.9349690621598662 key: test_precision value: [0.95238095 0.875 0.91666667 0.86956522 0.91304348 0.86956522 0.83333333 0.85714286 0.95652174 0.9047619 ] mean value: 0.8947981366459627 key: train_precision value: [0.9138756 0.92270531 0.93627451 0.92647059 0.93269231 0.93170732 0.92753623 0.92753623 0.92682927 0.9138756 ] mean value: 0.9259502965047404 key: test_recall value: [0.86956522 0.91304348 0.95652174 0.86956522 0.91304348 0.90909091 0.90909091 0.81818182 1. 0.86363636] mean value: 0.9021739130434783 key: train_recall value: [0.94554455 0.94554455 0.94554455 0.93564356 0.96039604 0.9408867 0.94581281 0.94581281 0.93596059 0.9408867 ] mean value: 0.9442032873238062 key: test_roc_auc value: [0.91205534 0.88833992 0.93280632 0.86660079 0.91106719 0.88932806 0.86758893 0.84387352 0.97826087 0.88833992] mean value: 0.8978260869565218 key: train_roc_auc value: [0.9284373 0.93336341 0.94075257 0.93087597 0.94571526 0.93578988 0.93577769 0.93577769 0.93085158 0.92588889] mean value: 0.934323025898649 key: test_jcc value: [0.83333333 0.80769231 0.88 0.76923077 0.84 0.8 0.76923077 0.72 0.95652174 0.79166667] mean value: 0.8167675585284281 key: train_jcc value: [0.86818182 0.87614679 0.88837209 0.87096774 0.89814815 0.88018433 0.88073394 0.88073394 0.87155963 0.86425339] mean value: 0.8779281838677705 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.43447208 0.54856634 0.45230627 0.30222797 0.46330261 0.58453083 0.51913071 0.37581491 0.53456688 0.40650988] mean value: 0.4621428489685059 key: score_time value: [0.02360606 0.03693914 0.02723217 0.01753807 0.02427649 0.02521634 0.02481031 0.03186703 0.04090238 0.03361082] mean value: 0.028599882125854494 key: test_mcc value: [0.82574419 0.77821935 0.86732843 0.77821935 0.82213439 0.78530224 0.73663511 0.64613475 0.95652174 0.77821935] mean value: 0.7974458893590411 key: train_mcc value: [0.85731376 0.86693826 0.88152664 0.90618217 0.93581427 0.91606106 0.80250226 0.92620337 0.86176621 0.85221434] mean value: 0.8806522363523859 key: test_accuracy value: [0.91111111 0.88888889 0.93333333 0.88888889 0.91111111 0.88888889 0.86666667 0.82222222 0.97777778 0.88888889] mean value: 0.8977777777777778 key: train_accuracy value: [0.92839506 0.93333333 0.94074074 0.95308642 0.96790123 0.95802469 0.90123457 0.96296296 0.9308642 0.92592593] mean value: 0.9402469135802469 key: test_fscore value: [0.90909091 0.89361702 0.93617021 0.89361702 0.91304348 0.89361702 0.86956522 0.80952381 0.97777778 0.88372093] mean value: 0.8979743398872972 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:148: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:151: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.92944039 0.93398533 0.9408867 0.9528536 0.96790123 0.95802469 0.90196078 0.96350365 0.93137255 0.92718447] mean value: 0.9407113391803744 key: test_precision value: [0.95238095 0.875 0.91666667 0.875 0.91304348 0.84 0.83333333 0.85 0.95652174 0.9047619 ] mean value: 0.8916708074534161 key: train_precision value: [0.9138756 0.92270531 0.93627451 0.95522388 0.96551724 0.96039604 0.89756098 0.95192308 0.92682927 0.9138756 ] mean value: 0.9344181502391634 key: test_recall value: [0.86956522 0.91304348 0.95652174 0.91304348 0.91304348 0.95454545 0.90909091 0.77272727 1. 0.86363636] mean value: 0.9065217391304348 key: train_recall value: [0.94554455 0.94554455 0.94554455 0.95049505 0.97029703 0.95566502 0.90640394 0.97536946 0.93596059 0.9408867 ] mean value: 0.9471711456859971 key: test_roc_auc value: [0.91205534 0.88833992 0.93280632 0.88833992 0.91106719 0.89031621 0.86758893 0.82114625 0.97826087 0.88833992] mean value: 0.8978260869565218 key: train_roc_auc value: [0.9284373 0.93336341 0.94075257 0.95308004 0.96790714 0.95803053 0.90122177 0.96293225 0.93085158 0.92588889] mean value: 0.9402465492854705 key: test_jcc value: [0.83333333 0.80769231 0.88 0.80769231 0.84 0.80769231 0.76923077 0.68 0.95652174 0.79166667] mean value: 0.8173829431438128 key: train_jcc value: [0.86818182 0.87614679 0.88837209 0.90995261 0.93779904 0.91943128 0.82142857 0.92957746 0.87155963 0.86425339] mean value: 0.888670269242401 MCC on Blind test: 0.8 Accuracy on Blind test: 0.9 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.04649901 0.10190964 0.07873082 0.10802031 0.04602194 0.10666394 0.04753137 0.04401898 0.05204272 0.05098581] mean value: 0.06824245452880859 key: score_time value: [0.01550841 0.02776861 0.01159406 0.01329851 0.01062369 0.01794052 0.01888156 0.01782799 0.01066375 0.0106461 ] mean value: 0.015475320816040038 key: test_mcc value: [0.86452993 0.77352678 0.77352678 0.6882472 0.86452993 0.68252363 0.77352678 0.90909091 0.81818182 0.95553309] mean value: 0.8103216868800538 key: train_mcc value: [0.85858586 0.88929729 0.87383768 0.86873119 0.86391186 0.87374852 0.86373551 0.85876112 0.85354624 0.85380763] mean value: 0.8657962897237084 key: test_accuracy value: [0.93181818 0.88636364 0.88636364 0.84090909 0.93181818 0.84090909 0.88636364 0.95454545 0.90909091 0.97727273] mean value: 0.9045454545454545 key: train_accuracy value: [0.92929293 0.94444444 0.93686869 0.93434343 0.93181818 0.93686869 0.93181818 0.92929293 0.92676768 0.92676768] mean value: 0.9328282828282828 key: test_fscore value: [0.93333333 0.88372093 0.88372093 0.85106383 0.93023256 0.84444444 0.88372093 0.95454545 0.90909091 0.97777778] mean value: 0.9051651097816362 key: train_fscore value: [0.92929293 0.94527363 0.93734336 0.93467337 0.93266833 0.93702771 0.93233083 0.93 0.92695214 0.9276808 ] mean value: 0.9333243089480099 key: test_precision value: [0.91304348 0.9047619 0.9047619 0.8 0.95238095 0.82608696 0.9047619 0.95454545 0.90909091 0.95652174] mean value: 0.9025955204216074 key: train_precision value: [0.92929293 0.93137255 0.93034826 0.93 0.92118227 0.93467337 0.92537313 0.92079208 0.92462312 0.91625616] mean value: 0.9263913856612664 key: test_recall value: [0.95454545 0.86363636 0.86363636 0.90909091 0.90909091 0.86363636 0.86363636 0.95454545 0.90909091 1. ] mean value: 0.9090909090909091 key: train_recall value: [0.92929293 0.95959596 0.94444444 0.93939394 0.94444444 0.93939394 0.93939394 0.93939394 0.92929293 0.93939394] mean value: 0.9404040404040405 key: test_roc_auc value: [0.93181818 0.88636364 0.88636364 0.84090909 0.93181818 0.84090909 0.88636364 0.95454545 0.90909091 0.97727273] mean value: 0.9045454545454545 key: train_roc_auc value: [0.92929293 0.94444444 0.93686869 0.93434343 0.93181818 0.93686869 0.93181818 0.92929293 0.92676768 0.92676768] mean value: 0.9328282828282828 key: test_jcc value: [0.875 0.79166667 0.79166667 0.74074074 0.86956522 0.73076923 0.79166667 0.91304348 0.83333333 0.95652174] mean value: 0.8293973739625913 key: train_jcc value: [0.86792453 0.89622642 0.88207547 0.87735849 0.87383178 0.88151659 0.87323944 0.86915888 0.86384977 0.86511628] mean value: 0.8750297628491411 MCC on Blind test: 0.8 Accuracy on Blind test: 0.9 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [2.08121514 2.80702353 3.87393689 2.53061247 2.8521018 2.31932235 3.42499232 2.182899 3.24924803 2.97179842] mean value: 2.8293149948120115 key: score_time value: [0.01505876 0.01849389 0.02364302 0.03510594 0.01178694 0.0123539 0.03271341 0.02871919 0.02154374 0.01550484] mean value: 0.021492362022399902 key: test_mcc value: [0.86452993 0.77352678 0.81818182 0.68252363 0.82158384 0.68252363 0.77352678 0.90909091 0.81818182 0.95553309] mean value: 0.8099202235722511 key: train_mcc value: [0.81822356 0.84865804 0.88393985 0.8939508 0.82866339 0.82832509 0.89903576 0.88393985 0.89903576 0.88393985] mean value: 0.8667711957739941 key: test_accuracy value: [0.93181818 0.88636364 0.90909091 0.84090909 0.90909091 0.84090909 0.88636364 0.95454545 0.90909091 0.97727273] mean value: 0.9045454545454545 key: train_accuracy value: [0.90909091 0.92424242 0.94191919 0.9469697 0.91414141 0.91414141 0.94949495 0.94191919 0.94949495 0.94191919] mean value: 0.9333333333333333 key: test_fscore value: [0.93333333 0.88372093 0.90909091 0.84444444 0.9047619 0.84444444 0.88372093 0.95454545 0.90909091 0.97777778] mean value: 0.9044931037954294 key: train_fscore value: [0.90954774 0.925 0.94235589 0.94710327 0.91542289 0.91457286 0.94974874 0.94235589 0.94974874 0.94235589] mean value: 0.9338211919756527 key: test_precision value: [0.91304348 0.9047619 0.90909091 0.82608696 0.95 0.82608696 0.9047619 0.95454545 0.90909091 0.95652174] mean value: 0.9053990212685865 key: train_precision value: [0.905 0.91584158 0.93532338 0.94472362 0.90196078 0.91 0.945 0.93532338 0.945 0.93532338] mean value: 0.9273496135816325 key: test_recall value: [0.95454545 0.86363636 0.90909091 0.86363636 0.86363636 0.86363636 0.86363636 0.95454545 0.90909091 1. ] mean value: 0.9045454545454545 key: train_recall value: [0.91414141 0.93434343 0.94949495 0.94949495 0.92929293 0.91919192 0.95454545 0.94949495 0.95454545 0.94949495] mean value: 0.9404040404040405 key: test_roc_auc value: [0.93181818 0.88636364 0.90909091 0.84090909 0.90909091 0.84090909 0.88636364 0.95454545 0.90909091 0.97727273] mean value: 0.9045454545454545 key: train_roc_auc value: [0.90909091 0.92424242 0.94191919 0.9469697 0.91414141 0.91414141 0.94949495 0.94191919 0.94949495 0.94191919] mean value: 0.9333333333333333 key: test_jcc value: [0.875 0.79166667 0.83333333 0.73076923 0.82608696 0.73076923 0.79166667 0.91304348 0.83333333 0.95652174] mean value: 0.8282190635451505 key: train_jcc value: [0.83410138 0.86046512 0.89099526 0.89952153 0.8440367 0.84259259 0.90430622 0.89099526 0.90430622 0.89099526] mean value: 0.8762315541890235 MCC on Blind test: 0.8 Accuracy on Blind test: 0.9 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01285553 0.01226926 0.01220226 0.01206255 0.01200318 0.01196909 0.01202083 0.01223373 0.01212716 0.01206827] mean value: 0.01218118667602539 key: score_time value: [0.0109396 0.01060057 0.01065397 0.01066351 0.01072145 0.01063704 0.01068616 0.01052976 0.01061583 0.01066804] mean value: 0.010671591758728028 key: test_mcc value: [0.72727273 0.46225016 0.62330229 0.6882472 0.5547002 0.45454545 0.60678804 0.68252363 0.6882472 0.6882472 ] mean value: 0.6176124107532563 key: train_mcc value: [0.6873189 0.70131223 0.65677139 0.73180407 0.66882888 0.71147617 0.6771364 0.6771364 0.66144272 0.6724898 ] mean value: 0.6845716943284235 key: test_accuracy value: [0.86363636 0.72727273 0.79545455 0.84090909 0.77272727 0.72727273 0.79545455 0.84090909 0.84090909 0.84090909] mean value: 0.8045454545454546 key: train_accuracy value: [0.84090909 0.84848485 0.82575758 0.86363636 0.83080808 0.85353535 0.83585859 0.83585859 0.82828283 0.83333333] mean value: 0.8396464646464646 key: test_fscore value: [0.86363636 0.7 0.75675676 0.82926829 0.75 0.72727273 0.76923077 0.8372093 0.82926829 0.82926829] mean value: 0.7891910797270979 key: train_fscore value: [0.83018868 0.83957219 0.81401617 0.85561497 0.81743869 0.84491979 0.82479784 0.82479784 0.8172043 0.82162162] mean value: 0.8290172105750199 key: test_precision value: [0.86363636 0.77777778 0.93333333 0.89473684 0.83333333 0.72727273 0.88235294 0.85714286 0.89473684 0.89473684] mean value: 0.8559059859988652 key: train_precision value: [0.89017341 0.89204545 0.87283237 0.90909091 0.88757396 0.89772727 0.88439306 0.88439306 0.87356322 0.88372093] mean value: 0.8875513656998492 key: test_recall value: [0.86363636 0.63636364 0.63636364 0.77272727 0.68181818 0.72727273 0.68181818 0.81818182 0.77272727 0.77272727] mean value: 0.7363636363636363 key: train_recall value: [0.77777778 0.79292929 0.76262626 0.80808081 0.75757576 0.7979798 0.77272727 0.77272727 0.76767677 0.76767677] mean value: 0.7777777777777778 key: test_roc_auc value: [0.86363636 0.72727273 0.79545455 0.84090909 0.77272727 0.72727273 0.79545455 0.84090909 0.84090909 0.84090909] mean value: 0.8045454545454546 key: train_roc_auc value: [0.84090909 0.84848485 0.82575758 0.86363636 0.83080808 0.85353535 0.83585859 0.83585859 0.82828283 0.83333333] mean value: 0.8396464646464646 key: test_jcc value: [0.76 0.53846154 0.60869565 0.70833333 0.6 0.57142857 0.625 0.72 0.70833333 0.70833333] mean value: 0.6548585762064023 key: train_jcc value: [0.70967742 0.7235023 0.68636364 0.74766355 0.69124424 0.73148148 0.70183486 0.70183486 0.69090909 0.69724771] mean value: 0.7081759154482379 MCC on Blind test: 0.68 Accuracy on Blind test: 0.84 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01238537 0.01245689 0.01242876 0.01219463 0.01234508 0.01249957 0.01238346 0.01238561 0.01246309 0.01236248] mean value: 0.012390494346618652 key: score_time value: [0.01070952 0.01077819 0.01080084 0.01070762 0.01050544 0.01057076 0.0105443 0.01068759 0.01069641 0.01076317] mean value: 0.010676383972167969 key: test_mcc value: [0.82158384 0.32118203 0.81818182 0.59152048 0.59152048 0.50051733 0.63636364 0.77352678 0.77352678 0.77352678] mean value: 0.6601449963922943 key: train_mcc value: [0.74250948 0.68434524 0.75299597 0.77793654 0.70837286 0.74243371 0.74243371 0.72230514 0.75253485 0.7577304 ] mean value: 0.7383597900292419 key: test_accuracy value: [0.90909091 0.65909091 0.90909091 0.79545455 0.79545455 0.75 0.81818182 0.88636364 0.88636364 0.88636364] mean value: 0.8295454545454546 key: train_accuracy value: [0.87121212 0.84090909 0.87626263 0.88888889 0.85353535 0.87121212 0.87121212 0.86111111 0.87626263 0.87878788] mean value: 0.8689393939393939 key: test_fscore value: [0.91304348 0.63414634 0.90909091 0.8 0.79069767 0.75555556 0.81818182 0.88372093 0.88888889 0.88888889] mean value: 0.8282214484981508 key: train_fscore value: [0.87218045 0.83377309 0.87841191 0.89 0.84895833 0.87088608 0.87088608 0.86005089 0.87657431 0.88 ] mean value: 0.868172113199113 key: test_precision value: [0.875 0.68421053 0.90909091 0.7826087 0.80952381 0.73913043 0.81818182 0.9047619 0.86956522 0.86956522] mean value: 0.8261638533091622 key: train_precision value: [0.86567164 0.87292818 0.86341463 0.88118812 0.87634409 0.87309645 0.87309645 0.86666667 0.87437186 0.87128713] mean value: 0.8718065205643388 key: test_recall value: [0.95454545 0.59090909 0.90909091 0.81818182 0.77272727 0.77272727 0.81818182 0.86363636 0.90909091 0.90909091] mean value: 0.8318181818181818 key: train_recall value: [0.87878788 0.7979798 0.89393939 0.8989899 0.82323232 0.86868687 0.86868687 0.85353535 0.87878788 0.88888889] mean value: 0.8651515151515151 key: test_roc_auc value: [0.90909091 0.65909091 0.90909091 0.79545455 0.79545455 0.75 0.81818182 0.88636364 0.88636364 0.88636364] mean value: 0.8295454545454546 key: train_roc_auc value: [0.87121212 0.84090909 0.87626263 0.88888889 0.85353535 0.87121212 0.87121212 0.86111111 0.87626263 0.87878788] mean value: 0.8689393939393939 key: test_jcc value: [0.84 0.46428571 0.83333333 0.66666667 0.65384615 0.60714286 0.69230769 0.79166667 0.8 0.8 ] mean value: 0.7149249084249084 key: train_jcc value: [0.77333333 0.71493213 0.78318584 0.8018018 0.73755656 0.77130045 0.77130045 0.75446429 0.78026906 0.78571429] mean value: 0.7673858190211428 MCC on Blind test: 0.73 Accuracy on Blind test: 0.87 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01081586 0.01136518 0.01138735 0.01129532 0.01170659 0.01099205 0.01889086 0.01352763 0.03923583 0.01395297] mean value: 0.015316963195800781 key: score_time value: [0.01401186 0.03292465 0.02555895 0.02621388 0.0230813 0.01509142 0.07618952 0.06447315 0.09006906 0.05066252] mean value: 0.0418276309967041 key: test_mcc value: [0.59648091 0.50051733 0.54545455 0.50471461 0.27386128 0.47245559 0.60678804 0.50051733 0.54772256 0.5547002 ] mean value: 0.510321238961513 key: train_mcc value: [0.68718427 0.69199863 0.68700889 0.66697297 0.70710678 0.71366109 0.70739557 0.67275618 0.66182722 0.66670068] mean value: 0.6862612279819491 key: test_accuracy value: [0.79545455 0.75 0.77272727 0.75 0.63636364 0.72727273 0.79545455 0.75 0.77272727 0.77272727] mean value: 0.7522727272727272 key: train_accuracy value: [0.84343434 0.8459596 0.84343434 0.83333333 0.85353535 0.85606061 0.85353535 0.83585859 0.83080808 0.83333333] mean value: 0.8429292929292929 key: test_fscore value: [0.80851064 0.74418605 0.77272727 0.73170732 0.61904762 0.68421053 0.76923077 0.75555556 0.7826087 0.75 ] mean value: 0.7417784440411851 key: train_fscore value: [0.84102564 0.84711779 0.84183673 0.83076923 0.85279188 0.85117493 0.85128205 0.83116883 0.8286445 0.83248731] mean value: 0.8408298907247728 key: test_precision value: [0.76 0.76190476 0.77272727 0.78947368 0.65 0.8125 0.88235294 0.73913043 0.75 0.83333333] mean value: 0.7751422428134973 key: train_precision value: [0.85416667 0.84079602 0.85051546 0.84375 0.85714286 0.88108108 0.86458333 0.85561497 0.83937824 0.83673469] mean value: 0.8523763327523514 key: test_recall value: [0.86363636 0.72727273 0.77272727 0.68181818 0.59090909 0.59090909 0.68181818 0.77272727 0.81818182 0.68181818] mean value: 0.7181818181818181 key: train_recall value: [0.82828283 0.85353535 0.83333333 0.81818182 0.84848485 0.82323232 0.83838384 0.80808081 0.81818182 0.82828283] mean value: 0.8297979797979798 key: test_roc_auc value: [0.79545455 0.75 0.77272727 0.75 0.63636364 0.72727273 0.79545455 0.75 0.77272727 0.77272727] mean value: 0.7522727272727273 key: train_roc_auc value: [0.84343434 0.8459596 0.84343434 0.83333333 0.85353535 0.85606061 0.85353535 0.83585859 0.83080808 0.83333333] mean value: 0.842929292929293 key: test_jcc value: [0.67857143 0.59259259 0.62962963 0.57692308 0.44827586 0.52 0.625 0.60714286 0.64285714 0.6 ] mean value: 0.5920992589785693 key: train_jcc value: [0.72566372 0.73478261 0.72687225 0.71052632 0.74336283 0.74090909 0.74107143 0.71111111 0.70742358 0.71304348] mean value: 0.7254766409492254 MCC on Blind test: 0.41 Accuracy on Blind test: 0.71 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.0494976 0.01854587 0.01976705 0.01910186 0.01809573 0.01886225 0.01888537 0.0258832 0.02618337 0.02646255] mean value: 0.024128484725952148 key: score_time value: [0.03666711 0.01181078 0.01129031 0.0110836 0.01145768 0.01238108 0.01312828 0.015697 0.01585555 0.01609397] mean value: 0.015546536445617676 key: test_mcc value: [0.86452993 0.77352678 0.81818182 0.6882472 0.7800135 0.68252363 0.73029674 0.90909091 0.81818182 0.77352678] mean value: 0.7838119120880734 key: train_mcc value: [0.7979798 0.81322466 0.81322466 0.8133907 0.80824576 0.81314168 0.80812204 0.79814268 0.80812204 0.7979798 ] mean value: 0.8071573819260374 key: test_accuracy value: [0.93181818 0.88636364 0.90909091 0.84090909 0.88636364 0.84090909 0.86363636 0.95454545 0.90909091 0.88636364] mean value: 0.8909090909090909 key: train_accuracy value: [0.8989899 0.90656566 0.90656566 0.90656566 0.9040404 0.90656566 0.9040404 0.8989899 0.9040404 0.8989899 ] mean value: 0.9035353535353535 key: test_fscore value: [0.93333333 0.88372093 0.90909091 0.85106383 0.87804878 0.84444444 0.85714286 0.95454545 0.90909091 0.88888889] mean value: 0.8909370337044393 key: train_fscore value: [0.8989899 0.90726817 0.90726817 0.90537084 0.905 0.90632911 0.90452261 0.9 0.90452261 0.8989899 ] mean value: 0.9038261322876402 key: test_precision value: [0.91304348 0.9047619 0.90909091 0.8 0.94736842 0.82608696 0.9 0.95454545 0.90909091 0.86956522] mean value: 0.8933553250715722 key: train_precision value: [0.8989899 0.90049751 0.90049751 0.91709845 0.8960396 0.90862944 0.9 0.89108911 0.9 0.8989899 ] mean value: 0.9011831422946928 key: test_recall value: [0.95454545 0.86363636 0.90909091 0.90909091 0.81818182 0.86363636 0.81818182 0.95454545 0.90909091 0.90909091] mean value: 0.8909090909090909 key: train_recall value: [0.8989899 0.91414141 0.91414141 0.89393939 0.91414141 0.9040404 0.90909091 0.90909091 0.90909091 0.8989899 ] mean value: 0.9065656565656566 key: test_roc_auc value: [0.93181818 0.88636364 0.90909091 0.84090909 0.88636364 0.84090909 0.86363636 0.95454545 0.90909091 0.88636364] mean value: 0.890909090909091 key: train_roc_auc value: [0.8989899 0.90656566 0.90656566 0.90656566 0.9040404 0.90656566 0.9040404 0.8989899 0.9040404 0.8989899 ] mean value: 0.9035353535353535 key: test_jcc value: [0.875 0.79166667 0.83333333 0.74074074 0.7826087 0.73076923 0.75 0.91304348 0.83333333 0.8 ] mean value: 0.8050495478756349 key: train_jcc value: [0.81651376 0.83027523 0.83027523 0.8271028 0.82648402 0.8287037 0.82568807 0.81818182 0.82568807 0.81651376] mean value: 0.8245426472329047 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [2.06339478 2.06864357 2.89429951 1.91644716 1.80405879 2.09717655 4.19174433 4.01769924 1.65347528 3.00509381] mean value: 2.5712033033370973 key: score_time value: [0.01534939 0.01772881 0.0145216 0.01324105 0.01330876 0.02885771 0.02873945 0.01542616 0.02645946 0.02501845] mean value: 0.019865083694458007 key: test_mcc value: [0.82158384 0.7800135 0.77352678 0.73029674 0.86452993 0.63900965 0.81818182 0.86452993 0.81818182 0.86452993] mean value: 0.7974383949974557 key: train_mcc value: [1. 0.99496218 1. 1. 0.99496218 1. 0.99496218 1. 1. 1. ] mean value: 0.9984886553739265 key: test_accuracy value: [0.90909091 0.88636364 0.88636364 0.86363636 0.93181818 0.81818182 0.90909091 0.93181818 0.90909091 0.93181818] mean value: 0.8977272727272727 key: train_accuracy value: [1. 0.99747475 1. 1. 0.99747475 1. 0.99747475 1. 1. 1. ] mean value: 0.9992424242424243 key: test_fscore value: [0.91304348 0.87804878 0.88888889 0.85714286 0.93023256 0.82608696 0.90909091 0.93023256 0.90909091 0.93023256] mean value: 0.8972090453902583 key: train_fscore value: [1. 0.99746835 1. 1. 0.99746835 1. 0.99746835 1. 1. 1. ] mean value: 0.9992405063291139 key: test_precision value: [0.875 0.94736842 0.86956522 0.9 0.95238095 0.79166667 0.90909091 0.95238095 0.90909091 0.95238095] mean value: 0.9058924980435278 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.95454545 0.81818182 0.90909091 0.81818182 0.90909091 0.86363636 0.90909091 0.90909091 0.90909091 0.90909091] mean value: 0.8909090909090909 key: train_recall value: [1. 0.99494949 1. 1. 0.99494949 1. 0.99494949 1. 1. 1. ] mean value: 0.9984848484848485 key: test_roc_auc value: [0.90909091 0.88636364 0.88636364 0.86363636 0.93181818 0.81818182 0.90909091 0.93181818 0.90909091 0.93181818] mean value: 0.8977272727272728 key: train_roc_auc value: [1. 0.99747475 1. 1. 0.99747475 1. 0.99747475 1. 1. 1. ] mean value: 0.9992424242424243 key: test_jcc value: [0.84 0.7826087 0.8 0.75 0.86956522 0.7037037 0.83333333 0.86956522 0.83333333 0.86956522] mean value: 0.8151674718196458 key: train_jcc value: [1. 0.99494949 1. 1. 0.99494949 1. 0.99494949 1. 1. 1. ] mean value: 0.9984848484848485 MCC on Blind test: 0.73 Accuracy on Blind test: 0.87 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.03948331 0.0421505 0.02812696 0.02692723 0.0463903 0.02427006 0.02560449 0.04395103 0.02709556 0.02732158] mean value: 0.033132100105285646 key: score_time value: [0.01248169 0.01416612 0.01280332 0.01256847 0.01260996 0.0124898 0.01261306 0.01800823 0.01289749 0.01290941] mean value: 0.013354754447937012 key: test_mcc value: [0.91287093 0.81818182 0.81818182 0.82158384 0.95553309 0.77352678 0.81818182 0.77352678 0.86452993 0.87177979] mean value: 0.8427896597116962 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95454545 0.90909091 0.90909091 0.90909091 0.97727273 0.88636364 0.90909091 0.88636364 0.93181818 0.93181818] mean value: 0.9204545454545454 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95238095 0.90909091 0.90909091 0.9047619 0.97674419 0.88888889 0.90909091 0.88372093 0.93023256 0.93617021] mean value: 0.9200172360489035 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.90909091 0.90909091 0.95 1. 0.86956522 0.90909091 0.9047619 0.95238095 0.88 ] mean value: 0.9283980801806888 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.90909091 0.90909091 0.86363636 0.95454545 0.90909091 0.90909091 0.86363636 0.90909091 1. ] mean value: 0.9136363636363636 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95454545 0.90909091 0.90909091 0.90909091 0.97727273 0.88636364 0.90909091 0.88636364 0.93181818 0.93181818] mean value: 0.9204545454545455 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.90909091 0.83333333 0.83333333 0.82608696 0.95454545 0.8 0.83333333 0.79166667 0.86956522 0.88 ] mean value: 0.8530955204216074 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.91 Accuracy on Blind test: 0.96 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.20858884 0.16574883 0.16682291 0.16697645 0.16598296 0.16459179 0.17711449 0.17074442 0.16750717 0.21235633] mean value: 0.17664341926574706 key: score_time value: [0.02438045 0.0246768 0.02469969 0.02466249 0.02455401 0.02451372 0.02488565 0.02486062 0.02491927 0.02723241] mean value: 0.024938511848449706 key: test_mcc value: [0.77352678 0.7800135 0.81818182 0.6882472 0.77352678 0.73960026 0.77352678 0.90909091 0.81818182 0.77352678] mean value: 0.7847422639174152 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.88636364 0.88636364 0.90909091 0.84090909 0.88636364 0.86363636 0.88636364 0.95454545 0.90909091 0.88636364] mean value: 0.8909090909090909 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.88888889 0.87804878 0.90909091 0.85106383 0.88372093 0.875 0.88372093 0.95454545 0.90909091 0.88888889] mean value: 0.8922059521245206 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.86956522 0.94736842 0.90909091 0.8 0.9047619 0.80769231 0.9047619 0.95454545 0.90909091 0.86956522] mean value: 0.8876442245778631 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.81818182 0.90909091 0.90909091 0.86363636 0.95454545 0.86363636 0.95454545 0.90909091 0.90909091] mean value: 0.9 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.88636364 0.88636364 0.90909091 0.84090909 0.88636364 0.86363636 0.88636364 0.95454545 0.90909091 0.88636364] mean value: 0.890909090909091 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.8 0.7826087 0.83333333 0.74074074 0.79166667 0.77777778 0.79166667 0.91304348 0.83333333 0.8 ] mean value: 0.8064170692431563 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.82 Accuracy on Blind test: 0.91 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.0146656 0.02988982 0.01498175 0.01482368 0.01461172 0.01457262 0.01456785 0.01468992 0.01473713 0.01458812] mean value: 0.016212821006774902 key: score_time value: [0.01279306 0.0220902 0.02778959 0.02868342 0.01232696 0.01224971 0.01235223 0.01237464 0.01246667 0.01231074] mean value: 0.01654372215270996 key: test_mcc value: [0.45643546 0.36363636 0.50051733 0.31851103 0.41294832 0.36980013 0.63900965 0.63636364 0.50471461 0.54772256] mean value: 0.4749659098162487 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.72727273 0.68181818 0.75 0.65909091 0.70454545 0.68181818 0.81818182 0.81818182 0.75 0.77272727] mean value: 0.7363636363636363 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.71428571 0.68181818 0.74418605 0.65116279 0.68292683 0.70833333 0.80952381 0.81818182 0.76595745 0.7826087 ] mean value: 0.7358984666081136 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 0.68181818 0.76190476 0.66666667 0.73684211 0.65384615 0.85 0.81818182 0.72 0.75 ] mean value: 0.738925968768074 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.68181818 0.68181818 0.72727273 0.63636364 0.63636364 0.77272727 0.77272727 0.81818182 0.81818182 0.81818182] mean value: 0.7363636363636363 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.72727273 0.68181818 0.75 0.65909091 0.70454545 0.68181818 0.81818182 0.81818182 0.75 0.77272727] mean value: 0.7363636363636363 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.55555556 0.51724138 0.59259259 0.48275862 0.51851852 0.5483871 0.68 0.69230769 0.62068966 0.64285714] mean value: 0.5850908253778109 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.52 Accuracy on Blind test: 0.76 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [2.39341021 2.50459886 1.97631717 2.46608806 2.39209723 2.58320475 2.78075337 2.49290299 3.35915875 2.54243302] mean value: 2.549096441268921 key: score_time value: [0.12756467 0.15043545 0.09386826 0.12860036 0.1271193 0.13112164 0.22609282 0.12711215 0.23179317 0.12944078] mean value: 0.14731485843658448 key: test_mcc value: [1. 0.91287093 0.90909091 0.82158384 0.86452993 0.82158384 0.86452993 1. 0.95553309 0.91287093] mean value: 0.9062593395597373 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.95454545 0.95454545 0.90909091 0.93181818 0.90909091 0.93181818 1. 0.97727273 0.95454545] mean value: 0.9522727272727273 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.95238095 0.95454545 0.91304348 0.93023256 0.91304348 0.93333333 1. 0.97674419 0.95652174] mean value: 0.952984518009796 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.95454545 0.875 0.95238095 0.875 0.91304348 1. 1. 0.91666667] mean value: 0.9486636551853943 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.90909091 0.95454545 0.95454545 0.90909091 0.95454545 0.95454545 1. 0.95454545 1. ] mean value: 0.9590909090909091 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.95454545 0.95454545 0.90909091 0.93181818 0.90909091 0.93181818 1. 0.97727273 0.95454545] mean value: 0.9522727272727273 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.90909091 0.91304348 0.84 0.86956522 0.84 0.875 1. 0.95454545 0.91666667] mean value: 0.9117911725955204 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.91 Accuracy on Blind test: 0.96 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.94391346 0.95903611 0.93460608 0.94266438 0.93002152 0.95997882 1.03751016 0.9221313 0.98442721 0.95550966] mean value: 0.9569798707962036 key: score_time value: [0.15100384 0.11824775 0.21430612 0.18143821 0.22737098 0.19880962 0.22887874 0.20321155 0.23416042 0.13647127] mean value: 0.18938984870910644 key: test_mcc value: [1. 0.87177979 0.82158384 0.7800135 0.81818182 0.82158384 0.86452993 0.95553309 0.95553309 0.86452993] mean value: 0.8753268816111682 key: train_mcc value: [0.94445649 0.95465504 0.94949495 0.94954339 0.94954339 0.95465504 0.94954339 0.94445649 0.94949495 0.94949495] mean value: 0.9495338084295475 key: test_accuracy value: [1. 0.93181818 0.90909091 0.88636364 0.90909091 0.90909091 0.93181818 0.97727273 0.97727273 0.93181818] mean value: 0.9363636363636363 key: train_accuracy value: [0.97222222 0.97727273 0.97474747 0.97474747 0.97474747 0.97727273 0.97474747 0.97222222 0.97474747 0.97474747] mean value: 0.9747474747474747 key: test_fscore value: [1. 0.92682927 0.9047619 0.89361702 0.90909091 0.91304348 0.93333333 0.97777778 0.97674419 0.93333333] mean value: 0.9368531212173918 key: train_fscore value: [0.9721519 0.97744361 0.97474747 0.97461929 0.97461929 0.97709924 0.97461929 0.9721519 0.97474747 0.97474747] mean value: 0.9746946935394861 key: test_precision value: [1. 1. 0.95 0.84 0.90909091 0.875 0.91304348 0.95652174 1. 0.91304348] mean value: 0.9356699604743083 /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: train_precision value: [0.97461929 0.97014925 0.97474747 0.97959184 0.97959184 0.98461538 0.97959184 0.97461929 0.97474747 0.97474747] mean value: 0.9767021151473436 key: test_recall value: [1. 0.86363636 0.86363636 0.95454545 0.90909091 0.95454545 0.95454545 1. 0.95454545 0.95454545] mean value: 0.9409090909090909 key: train_recall value: [0.96969697 0.98484848 0.97474747 0.96969697 0.96969697 0.96969697 0.96969697 0.96969697 0.97474747 0.97474747] mean value: 0.9727272727272728 key: test_roc_auc value: [1. 0.93181818 0.90909091 0.88636364 0.90909091 0.90909091 0.93181818 0.97727273 0.97727273 0.93181818] mean value: 0.9363636363636364 key: train_roc_auc value: [0.97222222 0.97727273 0.97474747 0.97474747 0.97474747 0.97727273 0.97474747 0.97222222 0.97474747 0.97474747] mean value: 0.9747474747474747 key: test_jcc value: [1. 0.86363636 0.82608696 0.80769231 0.83333333 0.84 0.875 0.95652174 0.95454545 0.875 ] mean value: 0.8831816154859633 key: train_jcc value: [0.94581281 0.95588235 0.95073892 0.95049505 0.95049505 0.95522388 0.95049505 0.94581281 0.95073892 0.95073892] mean value: 0.9506433746585062 MCC on Blind test: 0.89 Accuracy on Blind test: 0.95 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01132703 0.01045918 0.01127529 0.01106215 0.01119161 0.01149607 0.01135707 0.01120472 0.01115823 0.01116991] mean value: 0.011170125007629395 key: score_time value: [0.00996137 0.0095737 0.01019239 0.00985885 0.00994515 0.01000214 0.01003671 0.01025009 0.01001143 0.0100646 ] mean value: 0.009989643096923828 key: test_mcc value: [0.82158384 0.32118203 0.81818182 0.59152048 0.59152048 0.50051733 0.63636364 0.77352678 0.77352678 0.77352678] mean value: 0.6601449963922943 key: train_mcc value: [0.74250948 0.68434524 0.75299597 0.77793654 0.70837286 0.74243371 0.74243371 0.72230514 0.75253485 0.7577304 ] mean value: 0.7383597900292419 key: test_accuracy value: [0.90909091 0.65909091 0.90909091 0.79545455 0.79545455 0.75 0.81818182 0.88636364 0.88636364 0.88636364] mean value: 0.8295454545454546 key: train_accuracy value: [0.87121212 0.84090909 0.87626263 0.88888889 0.85353535 0.87121212 0.87121212 0.86111111 0.87626263 0.87878788] mean value: 0.8689393939393939 key: test_fscore value: [0.91304348 0.63414634 0.90909091 0.8 0.79069767 0.75555556 0.81818182 0.88372093 0.88888889 0.88888889] mean value: 0.8282214484981508 key: train_fscore value: [0.87218045 0.83377309 0.87841191 0.89 0.84895833 0.87088608 0.87088608 0.86005089 0.87657431 0.88 ] mean value: 0.868172113199113 key: test_precision value: [0.875 0.68421053 0.90909091 0.7826087 0.80952381 0.73913043 0.81818182 0.9047619 0.86956522 0.86956522] mean value: 0.8261638533091622 key: train_precision value: [0.86567164 0.87292818 0.86341463 0.88118812 0.87634409 0.87309645 0.87309645 0.86666667 0.87437186 0.87128713] mean value: 0.8718065205643388 key: test_recall value: [0.95454545 0.59090909 0.90909091 0.81818182 0.77272727 0.77272727 0.81818182 0.86363636 0.90909091 0.90909091] mean value: 0.8318181818181818 key: train_recall value: [0.87878788 0.7979798 0.89393939 0.8989899 0.82323232 0.86868687 0.86868687 0.85353535 0.87878788 0.88888889] mean value: 0.8651515151515151 key: test_roc_auc value: [0.90909091 0.65909091 0.90909091 0.79545455 0.79545455 0.75 0.81818182 0.88636364 0.88636364 0.88636364] mean value: 0.8295454545454546 key: train_roc_auc value: [0.87121212 0.84090909 0.87626263 0.88888889 0.85353535 0.87121212 0.87121212 0.86111111 0.87626263 0.87878788] mean value: 0.8689393939393939 key: test_jcc value: [0.84 0.46428571 0.83333333 0.66666667 0.65384615 0.60714286 0.69230769 0.79166667 0.8 0.8 ] mean value: 0.7149249084249084 key: train_jcc value: [0.77333333 0.71493213 0.78318584 0.8018018 0.73755656 0.77130045 0.77130045 0.75446429 0.78026906 0.78571429] mean value: 0.7673858190211428 MCC on Blind test: 0.73 Accuracy on Blind test: 0.87 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [1.60504842 1.49270439 1.54242682 1.5466218 2.61647606 0.22439003 1.24004292 1.27762127 0.6597116 1.25325251] mean value: 1.3458295822143556 key: score_time value: [0.01259804 0.01326776 0.0133779 0.01286626 0.01313877 0.01219416 0.01778555 0.01312637 0.01357436 0.01212931] mean value: 0.013405847549438476 key: test_mcc value: [1. 0.86452993 0.86452993 0.86452993 0.95553309 0.82158384 0.90909091 0.95553309 0.95553309 1. ] mean value: 0.9190863807668139 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.93181818 0.93181818 0.93181818 0.97727273 0.90909091 0.95454545 0.97727273 0.97727273 1. ] mean value: 0.9590909090909091 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.93023256 0.93333333 0.93023256 0.97674419 0.91304348 0.95454545 0.97674419 0.97674419 1. ] mean value: 0.9591619940558263 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.95238095 0.91304348 0.95238095 1. 0.875 0.95454545 1. 1. 1. ] mean value: 0.9647350837568229 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.90909091 0.95454545 0.90909091 0.95454545 0.95454545 0.95454545 0.95454545 0.95454545 1. ] mean value: 0.9545454545454546 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.93181818 0.93181818 0.93181818 0.97727273 0.90909091 0.95454545 0.97727273 0.97727273 1. ] mean value: 0.9590909090909091 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.86956522 0.875 0.86956522 0.95454545 0.84 0.91304348 0.95454545 0.95454545 1. ] mean value: 0.9230810276679842 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.95 Accuracy on Blind test: 0.97 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.0410018 0.07751536 0.08132076 0.07177877 0.071033 0.06880021 0.0640676 0.07675624 0.0497086 0.07524252] mean value: 0.067722487449646 key: score_time value: [0.01247358 0.02159452 0.02081919 0.01254892 0.02184677 0.01257586 0.03190947 0.01236224 0.0167625 0.01244116] mean value: 0.017533421516418457 key: test_mcc value: [0.68252363 0.87177979 0.77352678 0.68252363 0.87177979 0.6882472 0.72727273 0.86452993 0.81818182 0.90909091] mean value: 0.7889456217849123 key: train_mcc value: [0.92434853 0.91937955 0.89903576 0.91415307 0.92434853 0.93435535 0.92948262 0.90909091 0.90414419 0.90913729] mean value: 0.9167475808639807 key: test_accuracy value: [0.84090909 0.93181818 0.88636364 0.84090909 0.93181818 0.84090909 0.86363636 0.93181818 0.90909091 0.95454545] mean value: 0.8931818181818182 key: train_accuracy value: [0.96212121 0.95959596 0.94949495 0.95707071 0.96212121 0.96717172 0.96464646 0.95454545 0.9520202 0.95454545] mean value: 0.9583333333333334 key: test_fscore value: [0.84444444 0.92682927 0.88888889 0.84444444 0.92682927 0.85106383 0.86363636 0.93333333 0.90909091 0.95454545] mean value: 0.8943106204756438 key: train_fscore value: [0.96240602 0.96 0.94974874 0.95717884 0.96240602 0.96708861 0.965 0.95454545 0.95238095 0.95477387] mean value: 0.9585528498971682 key: test_precision value: [0.82608696 1. 0.86956522 0.82608696 1. 0.8 0.86363636 0.91304348 0.90909091 0.95454545] mean value: 0.896205533596838 key: train_precision value: [0.95522388 0.95049505 0.945 0.95477387 0.95522388 0.96954315 0.95544554 0.95454545 0.94527363 0.95 ] mean value: 0.9535524458194542 key: test_recall value: [0.86363636 0.86363636 0.90909091 0.86363636 0.86363636 0.90909091 0.86363636 0.95454545 0.90909091 0.95454545] mean value: 0.8954545454545455 key: train_recall value: [0.96969697 0.96969697 0.95454545 0.95959596 0.96969697 0.96464646 0.97474747 0.95454545 0.95959596 0.95959596] mean value: 0.9636363636363636 key: test_roc_auc value: [0.84090909 0.93181818 0.88636364 0.84090909 0.93181818 0.84090909 0.86363636 0.93181818 0.90909091 0.95454545] mean value: 0.8931818181818182 key: train_roc_auc value: [0.96212121 0.95959596 0.94949495 0.95707071 0.96212121 0.96717172 0.96464646 0.95454545 0.9520202 0.95454545] mean value: 0.9583333333333334 key: test_jcc value: [0.73076923 0.86363636 0.8 0.73076923 0.86363636 0.74074074 0.76 0.875 0.83333333 0.91304348] mean value: 0.8110928741146133 key: train_jcc value: [0.92753623 0.92307692 0.90430622 0.9178744 0.92753623 0.93627451 0.93236715 0.91304348 0.90909091 0.91346154] mean value: 0.9204567588451691 MCC on Blind test: 0.7 Accuracy on Blind test: 0.85 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01386571 0.01546836 0.01592708 0.01628208 0.01617885 0.0131793 0.01606131 0.0107379 0.01122975 0.01280212] mean value: 0.014173245429992676 key: score_time value: [0.01187468 0.01450491 0.01421571 0.01432848 0.01400518 0.01220989 0.0135181 0.00963759 0.00962281 0.01071024] mean value: 0.012462759017944336 key: test_mcc value: [0.81818182 0.50051733 0.81818182 0.63900965 0.50471461 0.54772256 0.6882472 0.81818182 0.86452993 0.86452993] mean value: 0.7063816679047357 key: train_mcc value: [0.74751288 0.68774638 0.76286954 0.77297377 0.67365307 0.73771253 0.77281598 0.72786709 0.73308094 0.72786709] mean value: 0.7344099272046745 key: test_accuracy value: [0.90909091 0.75 0.90909091 0.81818182 0.75 0.77272727 0.84090909 0.90909091 0.93181818 0.93181818] mean value: 0.8522727272727273 key: train_accuracy value: [0.87373737 0.84343434 0.88131313 0.88636364 0.83585859 0.86868687 0.88636364 0.86363636 0.86616162 0.86363636] mean value: 0.8669191919191919 key: test_fscore value: [0.90909091 0.74418605 0.90909091 0.82608696 0.73170732 0.76190476 0.82926829 0.90909091 0.93333333 0.93333333] mean value: 0.8487092768633621 key: train_fscore value: [0.87309645 0.83937824 0.8797954 0.88491049 0.82939633 0.86666667 0.88549618 0.86082474 0.8630491 0.86082474] mean value: 0.8643438322870827 key: test_precision value: [0.90909091 0.76190476 0.90909091 0.79166667 0.78947368 0.8 0.89473684 0.90909091 0.91304348 0.91304348] mean value: 0.8591141638681684 key: train_precision value: [0.87755102 0.86170213 0.89119171 0.89637306 0.86338798 0.88020833 0.89230769 0.87894737 0.88359788 0.87894737] mean value: 0.8804214539130206 key: test_recall value: [0.90909091 0.72727273 0.90909091 0.86363636 0.68181818 0.72727273 0.77272727 0.90909091 0.95454545 0.95454545] mean value: 0.8409090909090909 key: train_recall value: [0.86868687 0.81818182 0.86868687 0.87373737 0.7979798 0.85353535 0.87878788 0.84343434 0.84343434 0.84343434] mean value: 0.848989898989899 key: test_roc_auc value: [0.90909091 0.75 0.90909091 0.81818182 0.75 0.77272727 0.84090909 0.90909091 0.93181818 0.93181818] mean value: 0.8522727272727273 key: train_roc_auc value: [0.87373737 0.84343434 0.88131313 0.88636364 0.83585859 0.86868687 0.88636364 0.86363636 0.86616162 0.86363636] mean value: 0.8669191919191919 key: test_jcc value: [0.83333333 0.59259259 0.83333333 0.7037037 0.57692308 0.61538462 0.70833333 0.83333333 0.875 0.875 ] mean value: 0.7446937321937322 key: train_jcc value: [0.77477477 0.72321429 0.78538813 0.79357798 0.70852018 0.76470588 0.79452055 0.75565611 0.75909091 0.75565611] mean value: 0.7615104905950141 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01290369 0.02171993 0.0183773 0.02184057 0.02121806 0.01895308 0.01816034 0.01787543 0.02349734 0.01725245] mean value: 0.019179821014404297 key: score_time value: [0.00926185 0.01158953 0.0118134 0.01216078 0.01294613 0.01253366 0.01261091 0.01218534 0.01236916 0.01217318] mean value: 0.01196439266204834 key: test_mcc value: [0.54794903 0.77352678 0.62330229 0.64715023 0.58554004 0.32539569 0.77352678 0.91287093 0.77352678 0.79349205] mean value: 0.6756280609159855 key: train_mcc value: [0.73305263 0.92036649 0.81060226 0.8786935 0.5976219 0.57346234 0.87374852 0.82790197 0.86140292 0.73125738] mean value: 0.7808109898653016 key: test_accuracy value: [0.75 0.88636364 0.79545455 0.81818182 0.77272727 0.63636364 0.88636364 0.95454545 0.88636364 0.88636364] mean value: 0.8272727272727273 key: train_accuracy value: [0.8510101 0.95959596 0.90151515 0.93686869 0.76767677 0.74747475 0.93686869 0.91161616 0.92929293 0.85606061] mean value: 0.8797979797979798 key: test_fscore value: [0.68571429 0.88372093 0.82352941 0.8 0.80769231 0.5 0.88372093 0.95238095 0.88888889 0.89795918] mean value: 0.8123606890579727 key: train_fscore value: [0.8259587 0.95854922 0.90780142 0.93333333 0.80991736 0.66216216 0.93670886 0.90666667 0.93203883 0.8707483 ] mean value: 0.8743884855867281 key: test_precision value: [0.92307692 0.9047619 0.72413793 0.88888889 0.7 0.8 0.9047619 1. 0.86956522 0.81481481] mean value: 0.8530007584730224 key: train_precision value: [0.9929078 0.98404255 0.85333333 0.98870056 0.68531469 1. 0.93908629 0.96045198 0.89719626 0.79012346] mean value: 0.9091156928519439 key: test_recall value: [0.54545455 0.86363636 0.95454545 0.72727273 0.95454545 0.36363636 0.86363636 0.90909091 0.90909091 1. ] mean value: 0.8090909090909091 key: train_recall value: [0.70707071 0.93434343 0.96969697 0.88383838 0.98989899 0.49494949 0.93434343 0.85858586 0.96969697 0.96969697] mean value: 0.8712121212121212 key: test_roc_auc value: [0.75 0.88636364 0.79545455 0.81818182 0.77272727 0.63636364 0.88636364 0.95454545 0.88636364 0.88636364] mean value: 0.8272727272727273 key: train_roc_auc value: [0.8510101 0.95959596 0.90151515 0.93686869 0.76767677 0.74747475 0.93686869 0.91161616 0.92929293 0.85606061] mean value: 0.8797979797979798 key: test_jcc value: [0.52173913 0.79166667 0.7 0.66666667 0.67741935 0.33333333 0.79166667 0.90909091 0.8 0.81481481] mean value: 0.7006397542512549 key: train_jcc value: [0.70351759 0.92039801 0.83116883 0.875 0.68055556 0.49494949 0.88095238 0.82926829 0.87272727 0.77108434] mean value: 0.7859621763275807 MCC on Blind test: 0.73 Accuracy on Blind test: 0.86 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02216911 0.01927686 0.01920557 0.02066255 0.01956725 0.01928711 0.01948738 0.02105021 0.02236152 0.0231297 ] mean value: 0.020619726181030272 key: score_time value: [0.01260042 0.01255798 0.01213145 0.01228118 0.01218748 0.01229119 0.01216459 0.01225376 0.0122602 0.01242185] mean value: 0.012315011024475098 key: test_mcc value: [0.70014004 0.79349205 0.47140452 0.68252363 0.82158384 0.73029674 0.60678804 0.87177979 0.75592895 0.66143783] mean value: 0.7095375421713689 key: train_mcc value: [0.81587826 0.79415212 0.62017367 0.91471323 0.89002473 0.89940294 0.72894554 0.89180538 0.77045723 0.78127257] mean value: 0.8106825677849934 key: test_accuracy value: [0.84090909 0.88636364 0.68181818 0.84090909 0.90909091 0.86363636 0.79545455 0.93181818 0.86363636 0.81818182] mean value: 0.8431818181818181 key: train_accuracy value: [0.90151515 0.88888889 0.77777778 0.95707071 0.94444444 0.94949495 0.84848485 0.94444444 0.87373737 0.88131313] mean value: 0.8967171717171717 key: test_fscore value: [0.82051282 0.87179487 0.53333333 0.8372093 0.9047619 0.86956522 0.76923077 0.92682927 0.84210526 0.78947368] mean value: 0.8164816435011689 key: train_fscore value: [0.89196676 0.87640449 0.71428571 0.9562982 0.94300518 0.94871795 0.82248521 0.94210526 0.85632184 0.86685552] mean value: 0.8818446131668011 key: test_precision value: [0.94117647 1. 1. 0.85714286 0.95 0.83333333 0.88235294 1. 1. 0.9375 ] mean value: 0.9401505602240896 key: train_precision value: [0.98773006 0.98734177 1. 0.97382199 0.96808511 0.96354167 0.99285714 0.98351648 0.99333333 0.98709677] mean value: 0.9837324329980541 key: test_recall value: [0.72727273 0.77272727 0.36363636 0.81818182 0.86363636 0.90909091 0.68181818 0.86363636 0.72727273 0.68181818] mean value: 0.740909090909091 key: train_recall value: [0.81313131 0.78787879 0.55555556 0.93939394 0.91919192 0.93434343 0.7020202 0.9040404 0.75252525 0.77272727] mean value: 0.8080808080808081 key: test_roc_auc value: [0.84090909 0.88636364 0.68181818 0.84090909 0.90909091 0.86363636 0.79545455 0.93181818 0.86363636 0.81818182] mean value: 0.8431818181818181 key: train_roc_auc value: [0.90151515 0.88888889 0.77777778 0.95707071 0.94444444 0.94949495 0.84848485 0.94444444 0.87373737 0.88131313] mean value: 0.8967171717171717 key: test_jcc value: [0.69565217 0.77272727 0.36363636 0.72 0.82608696 0.76923077 0.625 0.86363636 0.72727273 0.65217391] mean value: 0.7015416539981757 key: train_jcc value: [0.805 0.78 0.55555556 0.91625616 0.89215686 0.90243902 0.69849246 0.89054726 0.74874372 0.765 ] mean value: 0.7954191044912481 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.23656082 0.17917037 0.17021155 0.16980386 0.17501688 0.16724229 0.17781687 0.22106504 0.16424084 0.16352606] mean value: 0.18246545791625976 key: score_time value: [0.02395177 0.01514435 0.01610875 0.01673985 0.01671529 0.01640296 0.02107787 0.01673555 0.01646042 0.01520896] mean value: 0.01745457649230957 key: test_mcc value: [1. 0.86452993 0.77352678 0.77352678 0.95553309 0.86452993 0.90909091 0.95553309 0.95553309 0.95553309] mean value: 0.9007336690106234 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.93181818 0.88636364 0.88636364 0.97727273 0.93181818 0.95454545 0.97727273 0.97727273 0.97727273] mean value: 0.95 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.93023256 0.88372093 0.88888889 0.97674419 0.93333333 0.95454545 0.97674419 0.97674419 0.97777778] mean value: 0.9498731501057083 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.95238095 0.9047619 0.86956522 1. 0.91304348 0.95454545 1. 1. 0.95652174] mean value: 0.955081874647092 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.90909091 0.86363636 0.90909091 0.95454545 0.95454545 0.95454545 0.95454545 0.95454545 1. ] mean value: 0.9454545454545454 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.93181818 0.88636364 0.88636364 0.97727273 0.93181818 0.95454545 0.97727273 0.97727273 0.97727273] mean value: 0.9500000000000001 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.86956522 0.79166667 0.8 0.95454545 0.875 0.91304348 0.95454545 0.95454545 0.95652174] mean value: 0.9069433465085639 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.95 Accuracy on Blind test: 0.97 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.05210972 0.05436349 0.05617356 0.05209541 0.07101512 0.03992128 0.05424643 0.06785297 0.05630231 0.06146455] mean value: 0.05655448436737061 key: score_time value: [0.01932168 0.02250862 0.01773334 0.02535009 0.02827978 0.01752615 0.0307796 0.03320765 0.02754951 0.02019429] mean value: 0.024245071411132812 key: test_mcc value: [1. 0.86452993 0.86452993 0.86452993 0.95553309 0.82158384 0.90909091 0.91287093 0.95553309 0.95553309] mean value: 0.9103734736843416 key: train_mcc value: [0.98496155 1. 0.98994949 1. 0.98994949 0.99496218 1. 0.99496218 1. 0.96974644] mean value: 0.9924531348618916 key: test_accuracy value: [1. 0.93181818 0.93181818 0.93181818 0.97727273 0.90909091 0.95454545 0.95454545 0.97727273 0.97727273] mean value: 0.9545454545454546 key: train_accuracy value: [0.99242424 1. 0.99494949 1. 0.99494949 0.99747475 1. 0.99747475 1. 0.98484848] mean value: 0.9962121212121212 key: test_fscore value: [1. 0.93023256 0.93333333 0.93333333 0.97777778 0.91304348 0.95454545 0.95238095 0.97674419 0.97777778] mean value: 0.9549168851595545 key: train_fscore value: [0.99236641 1. 0.99497487 1. 0.99492386 0.99746835 1. 0.99746835 1. 0.98477157] mean value: 0.996197342691844 key: test_precision value: [1. 0.95238095 0.91304348 0.91304348 0.95652174 0.875 0.95454545 1. 1. 0.95652174] mean value: 0.9521056841709016 key: train_precision value: [1. 1. 0.99 1. 1. 1. 1. 1. 1. 0.98979592] mean value: 0.9979795918367347 key: test_recall value: [1. 0.90909091 0.95454545 0.95454545 1. 0.95454545 0.95454545 0.90909091 0.95454545 1. ] mean value: 0.9590909090909091 key: train_recall value: [0.98484848 1. 1. 1. 0.98989899 0.99494949 1. 0.99494949 1. 0.97979798] mean value: 0.9944444444444445 key: test_roc_auc value: [1. 0.93181818 0.93181818 0.93181818 0.97727273 0.90909091 0.95454545 0.95454545 0.97727273 0.97727273] mean value: 0.9545454545454546 key: train_roc_auc value: [0.99242424 1. 0.99494949 1. 0.99494949 0.99747475 1. 0.99747475 1. 0.98484848] mean value: 0.9962121212121212 key: test_jcc value: [1. 0.86956522 0.875 0.875 0.95652174 0.84 0.91304348 0.90909091 0.95454545 0.95652174] mean value: 0.9149288537549407 key: train_jcc value: [0.98484848 1. 0.99 1. 0.98989899 0.99494949 1. 0.99494949 1. 0.97 ] mean value: 0.9924646464646465 MCC on Blind test: 0.96 Accuracy on Blind test: 0.98 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.07493091 0.11247015 0.18619466 0.14949703 0.53921151 0.20738435 0.13005686 0.14361334 0.1468308 0.12836099] mean value: 0.18185505867004395 key: score_time value: [0.01473236 0.01457143 0.02343249 0.02896667 0.05819082 0.03038406 0.02774858 0.02561402 0.01513457 0.02963424] mean value: 0.026840925216674805 key: test_mcc value: [0.83205029 0.60678804 0.63636364 0.45643546 0.45454545 0.45643546 0.77352678 0.72727273 0.63900965 0.59152048] mean value: 0.617394799410964 key: train_mcc value: [0.98496155 0.98496155 0.98496155 0.98496155 0.99496218 0.98994949 0.98994949 0.98496155 0.98994949 0.98994949] mean value: 0.9879567906059172 key: test_accuracy value: [0.90909091 0.79545455 0.81818182 0.72727273 0.72727273 0.72727273 0.88636364 0.86363636 0.81818182 0.79545455] mean value: 0.8068181818181819 key: train_accuracy value: [0.99242424 0.99242424 0.99242424 0.99242424 0.99747475 0.99494949 0.99494949 0.99242424 0.99494949 0.99494949] mean value: 0.9939393939393939 key: test_fscore value: [0.91666667 0.76923077 0.81818182 0.71428571 0.72727273 0.71428571 0.88888889 0.86363636 0.80952381 0.79069767] mean value: 0.8012670146391077 key: train_fscore value: [0.99236641 0.99236641 0.99236641 0.99236641 0.99746835 0.99492386 0.99492386 0.99236641 0.99492386 0.99492386] mean value: 0.9938995846971164 key: test_precision value: [0.84615385 0.88235294 0.81818182 0.75 0.72727273 0.75 0.86956522 0.86363636 0.85 0.80952381] mean value: 0.8166686723336339 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.68181818 0.81818182 0.68181818 0.72727273 0.68181818 0.90909091 0.86363636 0.77272727 0.77272727] mean value: 0.7909090909090909 key: train_recall value: [0.98484848 0.98484848 0.98484848 0.98484848 0.99494949 0.98989899 0.98989899 0.98484848 0.98989899 0.98989899] mean value: 0.9878787878787879 key: test_roc_auc value: [0.90909091 0.79545455 0.81818182 0.72727273 0.72727273 0.72727273 0.88636364 0.86363636 0.81818182 0.79545455] mean value: 0.8068181818181818 key: train_roc_auc value: [0.99242424 0.99242424 0.99242424 0.99242424 0.99747475 0.99494949 0.99494949 0.99242424 0.99494949 0.99494949] mean value: 0.993939393939394 key: test_jcc value: [0.84615385 0.625 0.69230769 0.55555556 0.57142857 0.55555556 0.8 0.76 0.68 0.65384615] mean value: 0.6739847374847375 key: train_jcc value: [0.98484848 0.98484848 0.98484848 0.98484848 0.99494949 0.98989899 0.98989899 0.98484848 0.98989899 0.98989899] mean value: 0.9878787878787879 MCC on Blind test: 0.59 Accuracy on Blind test: 0.79 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.72530508 0.64669752 0.66160345 0.69841456 0.70706224 0.64407086 0.68081927 0.73172593 0.72187114 0.73497915] mean value: 0.695254921913147 key: score_time value: [0.00961423 0.0095017 0.0108285 0.01087284 0.0114634 0.00980496 0.01087284 0.01086545 0.01108456 0.01084566] mean value: 0.010575413703918457 key: test_mcc value: [1. 0.86452993 0.86452993 0.81818182 0.95553309 0.82158384 0.90909091 1. 0.95553309 0.91287093] mean value: 0.9101853534252073 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.93181818 0.93181818 0.90909091 0.97727273 0.90909091 0.95454545 1. 0.97727273 0.95454545] mean value: 0.9545454545454546 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.93023256 0.93333333 0.90909091 0.97674419 0.91304348 0.95454545 1. 0.97674419 0.95652174] mean value: 0.9550255844593559 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.95238095 0.91304348 0.90909091 1. 0.875 0.95454545 1. 1. 0.91666667] mean value: 0.9520727460944852 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.90909091 0.95454545 0.90909091 0.95454545 0.95454545 0.95454545 1. 0.95454545 1. ] mean value: 0.9590909090909091 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.93181818 0.93181818 0.90909091 0.97727273 0.90909091 0.95454545 1. 0.97727273 0.95454545] mean value: 0.9545454545454546 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.86956522 0.875 0.83333333 0.95454545 0.84 0.91304348 1. 0.95454545 0.91666667] mean value: 0.9156699604743083 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.95 Accuracy on Blind test: 0.97 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.06525111 0.04378963 0.03224754 0.06311655 0.06546879 0.1236515 0.13866115 0.07320833 0.10163593 0.10575056] mean value: 0.08127810955047607 key: score_time value: [0.02526879 0.01164603 0.01404119 0.01201868 0.01778102 0.02148128 0.01306319 0.03238225 0.0200932 0.01537132] mean value: 0.018314695358276366 key: test_mcc value: [ 0.41294832 0.30618622 0.40951418 0.59152048 0.2773501 0.37796447 0.29277002 -0.05634362 0.33562431 0.22750788] mean value: 0.3175042364901061 key: train_mcc value: [0.97485938 0.6751906 0.97984797 0.97984797 0.84241805 0.8407714 0.66332496 0.73135745 0.9459053 0.97984797] mean value: 0.8613371053146376 key: test_accuracy value: [0.70454545 0.63636364 0.70454545 0.79545455 0.63636364 0.68181818 0.63636364 0.47727273 0.65909091 0.61363636] mean value: 0.6545454545454545 key: train_accuracy value: [0.98737374 0.81313131 0.98989899 0.98989899 0.91666667 0.91414141 0.80555556 0.84848485 0.97222222 0.98989899] mean value: 0.9227272727272727 key: test_fscore value: [0.72340426 0.52941176 0.69767442 0.8 0.66666667 0.63157895 0.55555556 0.25806452 0.70588235 0.62222222] mean value: 0.6190460699512758 key: train_fscore value: [0.98746867 0.77018634 0.98994975 0.98994975 0.91008174 0.90607735 0.75862069 0.82142857 0.97297297 0.98994975] mean value: 0.9096685579306306 key: test_precision value: [0.68 0.75 0.71428571 0.7826087 0.61538462 0.75 0.71428571 0.44444444 0.62068966 0.60869565] mean value: 0.6680394491398989 key: train_precision value: [0.9800995 1. 0.985 0.985 0.98816568 1. 1. 1. 0.94736842 0.985 ] mean value: 0.9870633604013567 key: test_recall value: [0.77272727 0.40909091 0.68181818 0.81818182 0.72727273 0.54545455 0.45454545 0.18181818 0.81818182 0.63636364] mean value: 0.6045454545454545 key: train_recall value: [0.99494949 0.62626263 0.99494949 0.99494949 0.84343434 0.82828283 0.61111111 0.6969697 1. 0.99494949] mean value: 0.8585858585858586 key: test_roc_auc value: [0.70454545 0.63636364 0.70454545 0.79545455 0.63636364 0.68181818 0.63636364 0.47727273 0.65909091 0.61363636] mean value: 0.6545454545454545 key: train_roc_auc value: [0.98737374 0.81313131 0.98989899 0.98989899 0.91666667 0.91414141 0.80555556 0.84848485 0.97222222 0.98989899] mean value: 0.9227272727272727 key: test_jcc value: [0.56666667 0.36 0.53571429 0.66666667 0.5 0.46153846 0.38461538 0.14814815 0.54545455 0.4516129 ] mean value: 0.4620417062029965 key: train_jcc value: [0.97524752 0.62626263 0.9800995 0.9800995 0.835 0.82828283 0.61111111 0.6969697 0.94736842 0.9800995 ] mean value: 0.8460540715894056 MCC on Blind test: 0.52 Accuracy on Blind test: 0.76 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.0181272 0.01742435 0.03097105 0.05920911 0.03365254 0.03182578 0.04450655 0.05021977 0.03421092 0.03448963] mean value: 0.035463690757751465 key: score_time value: [0.01237655 0.01228476 0.01283073 0.03200722 0.0200367 0.01936555 0.03299618 0.03163648 0.03088164 0.03133059] mean value: 0.023574638366699218 key: test_mcc value: [0.81818182 0.77352678 0.77352678 0.68252363 0.86452993 0.73029674 0.77352678 0.95553309 0.81818182 0.91287093] mean value: 0.810269831392801 key: train_mcc value: [0.87374852 0.8693968 0.86391186 0.86873119 0.85876112 0.89404202 0.86886419 0.86373551 0.87896726 0.85363334] mean value: 0.8693791805866737 key: test_accuracy value: [0.90909091 0.88636364 0.88636364 0.84090909 0.93181818 0.86363636 0.88636364 0.97727273 0.90909091 0.95454545] mean value: 0.9045454545454545 key: train_accuracy value: [0.93686869 0.93434343 0.93181818 0.93434343 0.92929293 0.9469697 0.93434343 0.93181818 0.93939394 0.92676768] mean value: 0.9345959595959596 key: test_fscore value: [0.90909091 0.88372093 0.88888889 0.84444444 0.93023256 0.86956522 0.88372093 0.97674419 0.90909091 0.95652174] mean value: 0.9052020712688054 key: train_fscore /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:168: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:171: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) value: [0.93670886 0.93564356 0.93266833 0.93467337 0.93 0.94656489 0.935 0.93233083 0.94 0.9273183 ] mean value: 0.9350908129430359 key: test_precision value: [0.90909091 0.9047619 0.86956522 0.82608696 0.95238095 0.83333333 0.9047619 1. 0.90909091 0.91666667] mean value: 0.9025738753999624 key: train_precision value: [0.93908629 0.91747573 0.92118227 0.93 0.92079208 0.95384615 0.92574257 0.92537313 0.93069307 0.92039801] mean value: 0.9284589309478474 key: test_recall value: [0.90909091 0.86363636 0.90909091 0.86363636 0.90909091 0.90909091 0.86363636 0.95454545 0.90909091 1. ] mean value: 0.9090909090909091 key: train_recall value: [0.93434343 0.95454545 0.94444444 0.93939394 0.93939394 0.93939394 0.94444444 0.93939394 0.94949495 0.93434343] mean value: 0.9419191919191919 key: test_roc_auc value: [0.90909091 0.88636364 0.88636364 0.84090909 0.93181818 0.86363636 0.88636364 0.97727273 0.90909091 0.95454545] mean value: 0.9045454545454545 key: train_roc_auc value: [0.93686869 0.93434343 0.93181818 0.93434343 0.92929293 0.9469697 0.93434343 0.93181818 0.93939394 0.92676768] mean value: 0.9345959595959596 key: test_jcc value: [0.83333333 0.79166667 0.8 0.73076923 0.86956522 0.76923077 0.79166667 0.95454545 0.83333333 0.91666667] mean value: 0.8290777338603426 key: train_jcc value: [0.88095238 0.87906977 0.87383178 0.87735849 0.86915888 0.89855072 0.87793427 0.87323944 0.88679245 0.86448598] mean value: 0.8781374160862355 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.40553069 0.27984118 0.32102966 0.60245919 0.34661484 0.62287879 0.34234953 0.59961581 0.49291015 0.4358778 ] mean value: 0.44491076469421387 key: score_time value: [0.01224136 0.02761769 0.02338171 0.01102257 0.03149271 0.0273664 0.01604629 0.02581143 0.03535819 0.01974392] mean value: 0.023008227348327637 key: test_mcc value: [0.81818182 0.77352678 0.77352678 0.64715023 0.86452993 0.73029674 0.77352678 0.95553309 0.81818182 0.91287093] mean value: 0.8067324910067508 key: train_mcc value: [0.87374852 0.8693968 0.86391186 0.82332683 0.85876112 0.89404202 0.86886419 0.86373551 0.87896726 0.85363334] mean value: 0.8648387451125236 key: test_accuracy value: [0.90909091 0.88636364 0.88636364 0.81818182 0.93181818 0.86363636 0.88636364 0.97727273 0.90909091 0.95454545] mean value: 0.9022727272727272 key: train_accuracy value: [0.93686869 0.93434343 0.93181818 0.91161616 0.92929293 0.9469697 0.93434343 0.93181818 0.93939394 0.92676768] mean value: 0.9323232323232323 key: test_fscore value: [0.90909091 0.88372093 0.88888889 0.83333333 0.93023256 0.86956522 0.88372093 0.97674419 0.90909091 0.95652174] mean value: 0.9040909601576942 key: train_fscore value: [0.93670886 0.93564356 0.93266833 0.91094148 0.93 0.94656489 0.935 0.93233083 0.94 0.9273183 ] mean value: 0.9327176238423159 key: test_precision value: [0.90909091 0.9047619 0.86956522 0.76923077 0.95238095 0.83333333 0.9047619 1. 0.90909091 0.91666667] mean value: 0.8968882566708654 key: train_precision value: [0.93908629 0.91747573 0.92118227 0.91794872 0.92079208 0.95384615 0.92574257 0.92537313 0.93069307 0.92039801] mean value: 0.9272538027427192 key: test_recall value: [0.90909091 0.86363636 0.90909091 0.90909091 0.90909091 0.90909091 0.86363636 0.95454545 0.90909091 1. ] mean value: 0.9136363636363636 key: train_recall value: [0.93434343 0.95454545 0.94444444 0.9040404 0.93939394 0.93939394 0.94444444 0.93939394 0.94949495 0.93434343] mean value: 0.9383838383838384 key: test_roc_auc value: [0.90909091 0.88636364 0.88636364 0.81818182 0.93181818 0.86363636 0.88636364 0.97727273 0.90909091 0.95454545] mean value: 0.9022727272727273 key: train_roc_auc value: [0.93686869 0.93434343 0.93181818 0.91161616 0.92929293 0.9469697 0.93434343 0.93181818 0.93939394 0.92676768] mean value: 0.9323232323232323 key: test_jcc value: [0.83333333 0.79166667 0.8 0.71428571 0.86956522 0.76923077 0.79166667 0.95454545 0.83333333 0.91666667] mean value: 0.8274293822119909 key: train_jcc value: [0.88095238 0.87906977 0.87383178 0.8364486 0.86915888 0.89855072 0.87793427 0.87323944 0.88679245 0.86448598] mean value: 0.8740464268427159 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.17021847 0.0488503 0.04700518 0.10988641 0.11036181 0.18732977 0.1769886 0.13736916 0.13100767 0.08153677] mean value: 0.12005541324615479 key: score_time value: [0.01927829 0.01233315 0.01237607 0.02285433 0.01247287 0.02293801 0.01881552 0.01801705 0.01239252 0.02268243] mean value: 0.01741602420806885 key: test_mcc value: [0.91452919 0.91106719 1. 0.95652174 0.77865613 0.82506438 0.68911026 0.64426877 0.74410286 0.68972332] mean value: 0.8153043839779892 key: train_mcc value: [0.862096 0.86750864 0.84716163 0.85188889 0.87164354 0.85185095 0.85680144 0.88165855 0.87664317 0.86676585] mean value: 0.8634018662617092 key: test_accuracy value: [0.95555556 0.95555556 1. 0.97777778 0.88888889 0.91111111 0.84444444 0.82222222 0.86666667 0.84444444] mean value: 0.9066666666666666 key: train_accuracy value: [0.9308642 0.93333333 0.92345679 0.92592593 0.93580247 0.92592593 0.92839506 0.94074074 0.9382716 0.93333333] mean value: 0.931604938271605 key: test_fscore value: [0.95238095 0.95454545 1. 0.97777778 0.88888889 0.91666667 0.85106383 0.82608696 0.88 0.84444444] mean value: 0.9091854971013158 key: train_fscore value: [0.93203883 0.93493976 0.92457421 0.92647059 0.93627451 0.92574257 0.92839506 0.94117647 0.93857494 0.93366093] mean value: 0.9321847880082487 key: test_precision value: [1. 0.95454545 1. 0.95652174 0.86956522 0.88 0.83333333 0.82608696 0.81481481 0.86363636] mean value: 0.8998503879373445 key: train_precision value: [0.91866029 0.91509434 0.91346154 0.92195122 0.93170732 0.92574257 0.92610837 0.93203883 0.93170732 0.92682927] mean value: 0.9243301070709858 key: test_recall value: [0.90909091 0.95454545 1. 1. 0.90909091 0.95652174 0.86956522 0.82608696 0.95652174 0.82608696] mean value: 0.9207509881422925 key: train_recall value: [0.94581281 0.95566502 0.93596059 0.93103448 0.9408867 0.92574257 0.93069307 0.95049505 0.94554455 0.94059406] mean value: 0.9402428912842022 key: test_roc_auc value: [0.95454545 0.9555336 1. 0.97826087 0.88932806 0.91007905 0.84387352 0.82213439 0.86462451 0.84486166] mean value: 0.9063241106719367 key: train_roc_auc value: [0.9308272 0.93327806 0.92342584 0.92591328 0.93578988 0.92592547 0.92840072 0.94076477 0.93828952 0.93335122] mean value: 0.9315965956201532 key: test_jcc value: [0.90909091 0.91304348 1. 0.95652174 0.8 0.84615385 0.74074074 0.7037037 0.78571429 0.73076923] mean value: 0.838573793356402 key: train_jcc value: [0.87272727 0.87782805 0.85972851 0.8630137 0.88018433 0.86175115 0.86635945 0.88888889 0.88425926 0.87557604] mean value: 0.8730316648333466 MCC on Blind test: 0.8 Accuracy on Blind test: 0.9 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [2.99451971 2.03133249 2.58804703 2.36343884 2.9534328 3.19177508 2.325104 2.89092827 3.53216934 2.62339878] mean value: 2.74941463470459 key: score_time value: [0.02300715 0.01974368 0.01187634 0.01202106 0.01932669 0.02432084 0.02360487 0.04886913 0.01317501 0.03754377] mean value: 0.02334885597229004 key: test_mcc value: [0.91452919 0.86732843 1. 0.95652174 0.77865613 0.82506438 0.73559956 0.68972332 0.74410286 0.77865613] mean value: 0.8290181733629296 key: train_mcc value: [0.89656272 0.89152603 0.81736586 0.87655164 0.90127552 0.88152087 0.90618446 0.95556639 0.89140349 0.89639783] mean value: 0.8914354807454282 key: test_accuracy value: [0.95555556 0.93333333 1. 0.97777778 0.88888889 0.91111111 0.86666667 0.84444444 0.86666667 0.88888889] mean value: 0.9133333333333333 key: train_accuracy value: [0.94814815 0.94567901 0.90864198 0.9382716 0.95061728 0.94074074 0.95308642 0.97777778 0.94567901 0.94814815] mean value: 0.945679012345679 key: test_fscore value: [0.95238095 0.93023256 1. 0.97777778 0.88888889 0.91666667 0.875 0.84444444 0.88 0.88888889] mean value: 0.9154280177187154 key: train_fscore value: [0.94890511 0.94634146 0.90953545 0.93857494 0.95098039 0.94029851 0.95308642 0.97766749 0.94581281 0.94840295] mean value: 0.9459605533255245 key: test_precision value: [1. 0.95238095 1. 0.95652174 0.86956522 0.88 0.84 0.86363636 0.81481481 0.90909091] mean value: 0.9086009996444779 key: train_precision value: [0.9375 0.93719807 0.90291262 0.93627451 0.94634146 0.945 0.95073892 0.9800995 0.94117647 0.94146341] mean value: 0.9418704966176731 key: test_recall value: [0.90909091 0.90909091 1. 1. 0.90909091 0.95652174 0.91304348 0.82608696 0.95652174 0.86956522] mean value: 0.924901185770751 key: train_recall value: [0.96059113 0.95566502 0.91625616 0.9408867 0.95566502 0.93564356 0.95544554 0.97524752 0.95049505 0.95544554] mean value: 0.9501341267131639 key: test_roc_auc value: [0.95454545 0.93280632 1. 0.97826087 0.88932806 0.91007905 0.86561265 0.84486166 0.86462451 0.88932806] mean value: 0.9129446640316206 key: train_roc_auc value: [0.94811735 0.94565429 0.90862313 0.93826513 0.95060479 0.94072819 0.95309223 0.97777155 0.94569087 0.94816612] mean value: 0.9456713651660732 key: test_jcc value: [0.90909091 0.86956522 1. 0.95652174 0.8 0.84615385 0.77777778 0.73076923 0.78571429 0.8 ] mean value: 0.8475593006027788 key: train_jcc value: [0.90277778 0.89814815 0.83408072 0.88425926 0.90654206 0.88732394 0.91037736 0.95631068 0.89719626 0.90186916] mean value: 0.8978885361073676 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01388836 0.01219487 0.01202846 0.01193786 0.01194143 0.01207137 0.01265931 0.01293683 0.01277041 0.01270103] mean value: 0.012512993812561036 key: score_time value: [0.01055479 0.01051068 0.01049137 0.01057005 0.01054406 0.01070571 0.01106882 0.01103139 0.01090407 0.01119542] mean value: 0.010757637023925782 key: test_mcc value: [0.73320158 0.70501339 0.55533597 0.72299881 0.62869461 0.77821935 0.3860278 0.60637261 0.64426877 0.60637261] mean value: 0.6366505501355761 key: train_mcc value: [0.69394577 0.68810424 0.64177606 0.65988684 0.66444098 0.69047787 0.69047787 0.69787618 0.68793807 0.68334493] mean value: 0.6798268830360812 key: test_accuracy value: [0.86666667 0.84444444 0.77777778 0.84444444 0.8 0.88888889 0.68888889 0.8 0.82222222 0.8 ] mean value: 0.8133333333333334 key: train_accuracy value: [0.84691358 0.84197531 0.81728395 0.82716049 0.82962963 0.84197531 0.84197531 0.84691358 0.84197531 0.83950617] mean value: 0.8375308641975309 key: test_fscore value: [0.86363636 0.82051282 0.77272727 0.81081081 0.75675676 0.89361702 0.66666667 0.79069767 0.82608696 0.79069767] mean value: 0.7992210017746235 key: train_fscore value: [0.84878049 0.83333333 0.80319149 0.81578947 0.81889764 0.82978723 0.82978723 0.83769634 0.83246073 0.82939633] mean value: 0.8279120283586651 key: test_precision value: [0.86363636 0.94117647 0.77272727 1. 0.93333333 0.875 0.73684211 0.85 0.82608696 0.85 ] mean value: 0.8648802502070102 key: train_precision value: [0.84057971 0.8839779 0.87283237 0.87570621 0.87640449 0.89655172 0.89655172 0.88888889 0.88333333 0.88268156] mean value: 0.8797507924454793 key: test_recall value: [0.86363636 0.72727273 0.77272727 0.68181818 0.63636364 0.91304348 0.60869565 0.73913043 0.82608696 0.73913043] mean value: 0.7507905138339921 key: train_recall value: [0.85714286 0.78817734 0.74384236 0.7635468 0.76847291 0.77227723 0.77227723 0.79207921 0.78712871 0.78217822] mean value: 0.7827122860069258 key: test_roc_auc value: [0.86660079 0.84189723 0.77766798 0.84090909 0.79644269 0.88833992 0.69071146 0.8013834 0.82213439 0.8013834 ] mean value: 0.8127470355731226 key: train_roc_auc value: [0.84688826 0.84210847 0.81746574 0.82731795 0.82978101 0.84180364 0.84180364 0.84677852 0.84184022 0.83936497] mean value: 0.8375152416719505 key: test_jcc value: [0.76 0.69565217 0.62962963 0.68181818 0.60869565 0.80769231 0.5 0.65384615 0.7037037 0.65384615] mean value: 0.6694883956623087 key: train_jcc value: [0.73728814 0.71428571 0.67111111 0.68888889 0.69333333 0.70909091 0.70909091 0.72072072 0.71300448 0.70852018] mean value: 0.7065334385791937 MCC on Blind test: 0.68 Accuracy on Blind test: 0.84 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01217747 0.01220322 0.01207113 0.01213813 0.01209164 0.01210999 0.01217937 0.01207447 0.01208973 0.01212859] mean value: 0.01212637424468994 key: score_time value: [0.01052928 0.01042199 0.01046658 0.01045585 0.01039958 0.01040077 0.01052475 0.01056433 0.01042843 0.01073074] mean value: 0.010492229461669922 key: test_mcc value: [0.64613475 0.78405645 0.82213439 0.86758893 0.5169078 0.64613475 0.68911026 0.51089209 0.73320158 0.51185771] mean value: 0.6728018722919369 key: train_mcc value: [0.71373171 0.71391286 0.70403264 0.72839898 0.76296152 0.72399345 0.73363435 0.73363435 0.718529 0.75311563] mean value: 0.7285944480486638 key: test_accuracy value: [0.82222222 0.88888889 0.91111111 0.93333333 0.75555556 0.82222222 0.84444444 0.75555556 0.86666667 0.75555556] mean value: 0.8355555555555555 key: train_accuracy value: [0.85679012 0.85679012 0.85185185 0.86419753 0.88148148 0.8617284 0.86666667 0.86666667 0.85925926 0.87654321] mean value: 0.8641975308641976 key: test_fscore value: [0.80952381 0.87804878 0.90909091 0.93333333 0.76595745 0.83333333 0.85106383 0.76595745 0.86956522 0.75555556] mean value: 0.8371429662120305 key: train_fscore value: [0.85572139 0.855 0.85 0.86486486 0.8817734 0.85858586 0.86432161 0.86432161 0.85925926 0.87562189] mean value: 0.8629469881387253 key: test_precision value: [0.85 0.94736842 0.90909091 0.91304348 0.72 0.8 0.83333333 0.75 0.86956522 0.77272727] mean value: 0.836512863185632 key: train_precision value: [0.86432161 0.8680203 0.86294416 0.8627451 0.8817734 0.87628866 0.87755102 0.87755102 0.85714286 0.88 ] mean value: 0.8708338129852269 key: test_recall value: [0.77272727 0.81818182 0.90909091 0.95454545 0.81818182 0.86956522 0.86956522 0.7826087 0.86956522 0.73913043] mean value: 0.8403162055335969 key: train_recall value: [0.84729064 0.84236453 0.83743842 0.86699507 0.8817734 0.84158416 0.85148515 0.85148515 0.86138614 0.87128713] mean value: 0.8553089791737795 key: test_roc_auc value: [0.82114625 0.88735178 0.91106719 0.93379447 0.756917 0.82114625 0.84387352 0.75494071 0.86660079 0.75592885] mean value: 0.8352766798418972 key: train_roc_auc value: [0.85681364 0.85682583 0.85188753 0.86419061 0.88148076 0.86167878 0.86662927 0.86662927 0.8592645 0.87653026] mean value: 0.8641930449202556 key: test_jcc value: [0.68 0.7826087 0.83333333 0.875 0.62068966 0.71428571 0.74074074 0.62068966 0.76923077 0.60714286] mean value: 0.7243721420730417 key: train_jcc value: [0.74782609 0.74672489 0.73913043 0.76190476 0.78854626 0.75221239 0.76106195 0.76106195 0.75324675 0.77876106] mean value: 0.7590476528359691 MCC on Blind test: 0.73 Accuracy on Blind test: 0.87 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01150799 0.01243472 0.01137662 0.01154757 0.01141047 0.01107907 0.01314235 0.01315427 0.01413846 0.03310156] mean value: 0.014289307594299316 key: score_time value: [0.02538848 0.02856398 0.02112675 0.02574253 0.02009082 0.01918626 0.05908275 0.04989839 0.05723858 0.07202363] mean value: 0.03783421516418457 key: test_mcc value: [0.46640316 0.68972332 0.60079051 0.60000118 0.24655092 0.3860278 0.55666994 0.33399209 0.55533597 0.42744299] mean value: 0.4862937886203532 key: train_mcc value: [0.68398976 0.68932545 0.68482256 0.65994656 0.70428051 0.68422603 0.72399345 0.67997157 0.65931708 0.72375269] mean value: 0.6893625650465149 key: test_accuracy value: [0.73333333 0.84444444 0.8 0.8 0.62222222 0.68888889 0.77777778 0.66666667 0.77777778 0.71111111] mean value: 0.7422222222222222 key: train_accuracy value: [0.84197531 0.84444444 0.84197531 0.82962963 0.85185185 0.84197531 0.8617284 0.83950617 0.82962963 0.8617284 ] mean value: 0.8444444444444444 key: test_fscore value: [0.72727273 0.84444444 0.8 0.79069767 0.56410256 0.66666667 0.79166667 0.66666667 0.7826087 0.69767442] mean value: 0.7331800524495166 key: train_fscore value: [0.84158416 0.84210526 0.83838384 0.82619647 0.84924623 0.83919598 0.85858586 0.8346056 0.82793017 0.85929648] mean value: 0.8417130058090375 key: test_precision value: [0.72727273 0.82608696 0.7826087 0.80952381 0.64705882 0.73684211 0.76 0.68181818 0.7826087 0.75 ] mean value: 0.7503819995233375 key: train_precision value: [0.84577114 0.85714286 0.86010363 0.84536082 0.86666667 0.85204082 0.87628866 0.85863874 0.83417085 0.87244898] mean value: 0.856863317321244 key: test_recall value: [0.72727273 0.86363636 0.81818182 0.77272727 0.5 0.60869565 0.82608696 0.65217391 0.7826087 0.65217391] mean value: 0.7203557312252965 key: train_recall value: [0.83743842 0.82758621 0.81773399 0.80788177 0.83251232 0.82673267 0.84158416 0.81188119 0.82178218 0.84653465] mean value: 0.8271667560844754 key: test_roc_auc value: [0.73320158 0.84486166 0.80039526 0.79940711 0.61956522 0.69071146 0.77667984 0.66699605 0.77766798 0.71245059] mean value: 0.742193675889328 key: train_roc_auc value: [0.84198654 0.84448617 0.84203531 0.82968346 0.85189972 0.84193777 0.86167878 0.83943813 0.8296103 0.86169097] mean value: 0.8444447154075013 key: test_jcc value: [0.57142857 0.73076923 0.66666667 0.65384615 0.39285714 0.5 0.65517241 0.5 0.64285714 0.53571429] mean value: 0.5849311607932297 key: train_jcc value: [0.72649573 0.72727273 0.72173913 0.70386266 0.73799127 0.72294372 0.75221239 0.71615721 0.70638298 0.75330396] mean value: 0.726836177256853 MCC on Blind test: 0.41 Accuracy on Blind test: 0.71 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.04429531 0.05560946 0.05552363 0.02630138 0.01901889 0.01773357 0.0175724 0.01781559 0.01794982 0.01811028] mean value: 0.02899303436279297 key: score_time value: [0.04254222 0.02727342 0.02701807 0.0157671 0.01172519 0.01093841 0.01104927 0.01114202 0.01128674 0.01135397] mean value: 0.018009638786315917 key: test_mcc value: [0.83484711 0.91452919 1. 0.91106719 0.73663511 0.82506438 0.68972332 0.64426877 0.78405645 0.64426877] mean value: 0.7984460299700171 key: train_mcc value: [0.7927359 0.80280601 0.78773172 0.78766004 0.80741843 0.80741843 0.81238873 0.81234453 0.81234453 0.81234453] mean value: 0.8035192858986034 key: test_accuracy value: [0.91111111 0.95555556 1. 0.95555556 0.86666667 0.91111111 0.84444444 0.82222222 0.88888889 0.82222222] mean value: 0.8977777777777778 key: train_accuracy value: [0.8962963 0.90123457 0.89382716 0.89382716 0.9037037 0.9037037 0.90617284 0.90617284 0.90617284 0.90617284] mean value: 0.9017283950617284 key: test_fscore value: [0.9 0.95238095 1. 0.95454545 0.86956522 0.91666667 0.84444444 0.82608696 0.89795918 0.82608696] mean value: 0.898773583214577 key: train_fscore value: [0.89756098 0.90291262 0.89486553 0.89434889 0.9037037 0.9037037 0.90640394 0.90594059 0.90594059 0.90594059] mean value: 0.9021321147462571 key: test_precision value: [1. 1. 1. 0.95454545 0.83333333 0.88 0.86363636 0.82608696 0.84615385 0.82608696] mean value: 0.9029842910712476 key: train_precision value: [0.88888889 0.88995215 0.88834951 0.89215686 0.90594059 0.90147783 0.90196078 0.90594059 0.90594059 0.90594059] mean value: 0.8986548412370806 key: test_recall value: [0.81818182 0.90909091 1. 0.95454545 0.90909091 0.95652174 0.82608696 0.82608696 0.95652174 0.82608696] mean value: 0.8982213438735178 key: train_recall value: [0.90640394 0.91625616 0.90147783 0.89655172 0.90147783 0.90594059 0.91089109 0.90594059 0.90594059 0.90594059] mean value: 0.9056820953031264 key: test_roc_auc value: [0.90909091 0.95454545 1. 0.9555336 0.86758893 0.91007905 0.84486166 0.82213439 0.88735178 0.82213439] mean value: 0.8973320158102767 key: train_roc_auc value: [0.89627128 0.90119739 0.89380822 0.89382042 0.90370921 0.90370921 0.90618446 0.90617227 0.90617227 0.90617227] mean value: 0.9017216992635224 key: test_jcc value: [0.81818182 0.90909091 1. 0.91304348 0.76923077 0.84615385 0.73076923 0.7037037 0.81481481 0.7037037 ] mean value: 0.8208692273909666 key: train_jcc value: [0.81415929 0.82300885 0.80973451 0.80888889 0.82432432 0.82432432 0.82882883 0.8280543 0.8280543 0.8280543 ] mean value: 0.8217431917161224 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.16594219 1.53831148 0.54525065 0.66455746 0.58896971 1.97863793 1.58019543 2.03711033 1.07067323 0.89870596] mean value: 1.206835436820984 key: score_time value: [0.02138186 0.01350737 0.01318359 0.01283979 0.02119589 0.02027488 0.02557015 0.01267719 0.0126822 0.0126617 ] mean value: 0.016597461700439454 key: test_mcc value: [0.83484711 0.91106719 0.95652174 0.95652174 0.68911026 0.86732843 0.60079051 0.68911026 0.61657545 0.56604076] mean value: 0.7687913462185909 key: train_mcc value: [0.81736586 0.86188899 0.80263415 0.81331421 0.81271657 0.80904514 0.83406549 0.86381736 0.84818518 0.86717283] mean value: 0.8330205780457784 key: test_accuracy value: [0.91111111 0.95555556 0.97777778 0.97777778 0.84444444 0.93333333 0.8 0.84444444 0.77777778 0.77777778] mean value: 0.88 key: train_accuracy value: [0.90864198 0.9308642 0.90123457 0.90617284 0.9037037 0.9037037 0.91604938 0.9308642 0.92345679 0.93333333] mean value: 0.9158024691358024 key: test_fscore value: [0.9 0.95454545 0.97777778 0.97777778 0.8372093 0.93617021 0.8 0.85106383 0.82142857 0.76190476] mean value: 0.8817877688313116 key: train_fscore value: [0.90953545 0.93170732 0.90049751 0.90865385 0.89817232 0.90025575 0.91282051 0.93301435 0.9253012 0.93198992] mean value: 0.9151948202363086 key: test_precision value: [1. 0.95454545 0.95652174 0.95652174 0.85714286 0.91666667 0.81818182 0.83333333 0.6969697 0.84210526] mean value: 0.8831988568258591 key: train_precision value: [0.90291262 0.92270531 0.90954774 0.88732394 0.95555556 0.93121693 0.94680851 0.90277778 0.90140845 0.94871795] mean value: 0.9208974792335061 key: test_recall value: [0.81818182 0.95454545 1. 1. 0.81818182 0.95652174 0.7826087 0.86956522 1. 0.69565217] mean value: 0.8895256916996047 key: train_recall value: [0.91625616 0.9408867 0.89162562 0.93103448 0.84729064 0.87128713 0.88118812 0.96534653 0.95049505 0.91584158] mean value: 0.9111252011900697 key: test_roc_auc value: [0.90909091 0.9555336 0.97826087 0.97826087 0.84387352 0.93280632 0.80039526 0.84387352 0.77272727 0.77964427] mean value: 0.8794466403162056 key: train_roc_auc value: [0.90862313 0.93083939 0.90125835 0.9061113 0.90384334 0.90362386 0.91596352 0.93094913 0.92352339 0.93329025] mean value: 0.9158025654782227 key: test_jcc value: [0.81818182 0.91304348 0.95652174 0.95652174 0.72 0.88 0.66666667 0.74074074 0.6969697 0.61538462] mean value: 0.7964030494465277 key: train_jcc value: [0.83408072 0.87214612 0.81900452 0.83259912 0.81516588 0.81860465 0.83962264 0.87443946 0.86098655 0.87264151] mean value: 0.8439291167891907 MCC on Blind test: 0.75 Accuracy on Blind test: 0.88 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02708101 0.02197504 0.0223949 0.02293539 0.02220106 0.01959419 0.02083325 0.01980853 0.01847649 0.02072978] mean value: 0.021602964401245116 key: score_time value: [0.01235509 0.00963283 0.01003337 0.01024127 0.00966859 0.0089643 0.00935817 0.00911379 0.00966048 0.00908685] mean value: 0.009811472892761231 key: test_mcc value: [0.82506438 0.82213439 0.91106719 0.95643752 0.87476705 0.95643752 0.91106719 0.86732843 0.74605372 0.78530224] mean value: 0.8655659627181006 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91111111 0.91111111 0.95555556 0.97777778 0.93333333 0.97777778 0.95555556 0.93333333 0.86666667 0.88888889] mean value: 0.9311111111111111 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.9047619 0.90909091 0.95454545 0.97674419 0.93617021 0.9787234 0.95652174 0.93617021 0.85714286 0.88372093] mean value: 0.9293591810737865 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95 0.90909091 0.95454545 1. 0.88 0.95833333 0.95652174 0.91666667 0.94736842 0.95 ] mean value: 0.942252652381943 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.86363636 0.90909091 0.95454545 0.95454545 1. 1. 0.95652174 0.95652174 0.7826087 0.82608696] mean value: 0.9203557312252965 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91007905 0.91106719 0.9555336 0.97727273 0.93478261 0.97727273 0.9555336 0.93280632 0.86857708 0.89031621] mean value: 0.9313241106719368 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.82608696 0.83333333 0.91304348 0.95454545 0.88 0.95833333 0.91666667 0.88 0.75 0.79166667] mean value: 0.8703675889328063 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.91 Accuracy on Blind test: 0.96 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.11638784 0.12223196 0.12821341 0.12665248 0.12380767 0.12398219 0.12610722 0.1174705 0.12105989 0.12040401] mean value: 0.12263171672821045 key: score_time value: [0.01924944 0.01850915 0.01916957 0.01890802 0.01916337 0.01787496 0.01930618 0.0202477 0.01806045 0.01934838] mean value: 0.01898372173309326 key: test_mcc value: [0.86732843 0.95643752 0.91106719 0.95652174 0.73320158 0.73559956 0.86732843 0.68972332 0.83484711 0.55841694] mean value: 0.8110471829947312 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93333333 0.97777778 0.95555556 0.97777778 0.86666667 0.86666667 0.93333333 0.84444444 0.91111111 0.77777778] mean value: 0.9044444444444445 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93023256 0.97674419 0.95454545 0.97777778 0.86363636 0.875 0.93617021 0.84444444 0.92 0.77272727] mean value: 0.9051278270083317 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.95238095 1. 0.95454545 0.95652174 0.86363636 0.84 0.91666667 0.86363636 0.85185185 0.80952381] mean value: 0.9008763201371897 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.95454545 0.95454545 1. 0.86363636 0.91304348 0.95652174 0.82608696 1. 0.73913043] mean value: 0.9116600790513834 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93280632 0.97727273 0.9555336 0.97826087 0.86660079 0.86561265 0.93280632 0.84486166 0.90909091 0.77865613] mean value: 0.9041501976284585 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.86956522 0.95454545 0.91304348 0.95652174 0.76 0.77777778 0.88 0.73076923 0.85185185 0.62962963] mean value: 0.8323704379356553 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.8 Accuracy on Blind test: 0.9 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01029634 0.01115966 0.01019788 0.01129127 0.01026201 0.0103786 0.01133752 0.01088929 0.01124072 0.01106381] mean value: 0.010811710357666015 key: score_time value: [0.00888944 0.00963187 0.00929451 0.00969863 0.0097034 0.00970888 0.00967789 0.0094142 0.00970745 0.00901723] mean value: 0.009474349021911622 key: test_mcc value: [0.52631666 0.60000118 0.77865613 0.60000118 0.24356483 0.64613475 0.33402405 0.43557241 0.77821935 0.51185771] mean value: 0.5454348232923768 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.75555556 0.8 0.88888889 0.8 0.62222222 0.82222222 0.66666667 0.71111111 0.88888889 0.75555556] mean value: 0.7711111111111111 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.7755102 0.79069767 0.88888889 0.79069767 0.60465116 0.83333333 0.69387755 0.68292683 0.89361702 0.75555556] mean value: 0.7709755895052613 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.7037037 0.80952381 0.86956522 0.80952381 0.61904762 0.8 0.65384615 0.77777778 0.875 0.77272727] mean value: 0.769071536354145 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.86363636 0.77272727 0.90909091 0.77272727 0.59090909 0.86956522 0.73913043 0.60869565 0.91304348 0.73913043] mean value: 0.7778656126482213 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.75790514 0.79940711 0.88932806 0.79940711 0.6215415 0.82114625 0.66501976 0.71343874 0.88833992 0.75592885] mean value: 0.7711462450592885 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.63333333 0.65384615 0.8 0.65384615 0.43333333 0.71428571 0.53125 0.51851852 0.80769231 0.60714286] mean value: 0.6353248371998372 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.55 Accuracy on Blind test: 0.77 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [2.70288324 4.30727792 2.46852684 2.61855292 2.55521131 2.43811393 2.36803675 1.75058436 2.72210979 2.69686818] mean value: 2.6628165245056152 key: score_time value: [0.25447559 0.19142795 0.16145873 0.14909005 0.13192844 0.12770748 0.09348536 0.12782025 0.173311 0.17790127] mean value: 0.15886061191558837 key: test_mcc value: [0.91452919 0.95643752 0.95652174 1. 0.86758893 0.91452919 0.82506438 0.82213439 0.95643752 0.82574419] mean value: 0.9038987043172344 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95555556 0.97777778 0.97777778 1. 0.93333333 0.95555556 0.91111111 0.91111111 0.97777778 0.91111111] mean value: 0.9511111111111111 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95238095 0.97674419 0.97777778 1. 0.93333333 0.95833333 0.91666667 0.91304348 0.9787234 0.90909091] mean value: 0.9516094041145673 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.95652174 1. 0.91304348 0.92 0.88 0.91304348 0.95833333 0.95238095] mean value: 0.949332298136646 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.95454545 1. 1. 0.95454545 1. 0.95652174 0.91304348 1. 0.86956522] mean value: 0.9557312252964427 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95454545 0.97727273 0.97826087 1. 0.93379447 0.95454545 0.91007905 0.91106719 0.97727273 0.91205534] mean value: 0.9508893280632411 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.90909091 0.95454545 0.95652174 1. 0.875 0.92 0.84615385 0.84 0.95833333 0.83333333] mean value: 0.9092978615587312 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.89 Accuracy on Blind test: 0.95 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0...05', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.58991671 1.86406493 2.03459716 1.74722981 2.07112956 2.12889957 1.83267713 1.98935246 1.70792747 2.10284638] mean value: 1.9068641185760498 key: score_time value: [0.21701336 0.21829891 0.22647643 0.18004084 0.16667223 0.17558956 0.19052815 0.21188164 0.21953034 0.15258312] mean value: 0.19586145877838135 key: test_mcc value: [0.91452919 0.95643752 0.95652174 1. 0.86758893 0.91452919 0.77821935 0.82213439 0.91452919 0.69583743] mean value: 0.8820326916601581 key: train_mcc value: [0.95556639 0.95061698 0.95061698 0.94569087 0.95556748 0.9457805 0.94568955 0.96053948 0.95066215 0.96049359] mean value: 0.9521223971905592 key: test_accuracy value: [0.95555556 0.97777778 0.97777778 1. 0.93333333 0.95555556 0.88888889 0.91111111 0.95555556 0.84444444] mean value: 0.94 key: train_accuracy value: [0.97777778 0.97530864 0.97530864 0.97283951 0.97777778 0.97283951 0.97283951 0.98024691 0.97530864 0.98024691] mean value: 0.9760493827160494 key: test_fscore value: [0.95238095 0.97674419 0.97777778 1. 0.93333333 0.95833333 0.89361702 0.91304348 0.95833333 0.8372093 ] mean value: 0.9400772718068289 key: train_fscore value: [0.97788698 0.97536946 0.97536946 0.97283951 0.97777778 0.97256858 0.97270471 0.9800995 0.97512438 0.98019802] mean value: 0.9759938371686563 key: test_precision value: [1. 1. 0.95652174 1. 0.91304348 0.92 0.875 0.91304348 0.92 0.9 ] mean value: 0.9397608695652174 key: train_precision value: [0.9754902 0.97536946 0.97536946 0.97524752 0.98019802 0.9798995 0.97512438 0.985 0.98 0.98019802] mean value: 0.9781896552287914 key: test_recall value: [0.90909091 0.95454545 1. 1. 0.95454545 1. 0.91304348 0.91304348 1. 0.7826087 ] mean value: 0.9426877470355731 key: train_recall value: [0.98029557 0.97536946 0.97536946 0.97044335 0.97536946 0.96534653 0.97029703 0.97524752 0.97029703 0.98019802] mean value: 0.9738233429254255 key: test_roc_auc value: [0.95454545 0.97727273 0.97826087 1. 0.93379447 0.95454545 0.88833992 0.91106719 0.95454545 0.8458498 ] mean value: 0.9398221343873517 key: train_roc_auc value: [0.97777155 0.97530849 0.97530849 0.97284544 0.97778374 0.97282105 0.97283324 0.9802346 0.9752963 0.98024679] mean value: 0.9760449690289226 key: test_jcc value: [0.90909091 0.95454545 0.95652174 1. 0.875 0.92 0.80769231 0.84 0.92 0.72 ] mean value: 0.8902850410459107 key: train_jcc value: [0.95673077 0.95192308 0.95192308 0.94711538 0.95652174 0.94660194 0.9468599 0.96097561 0.95145631 0.96116505] mean value: 0.9531272860931357 MCC on Blind test: 0.88 Accuracy on Blind test: 0.94 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.03431368 0.02331161 0.01544642 0.01508474 0.01589894 0.01519084 0.01523089 0.02336693 0.02334094 0.02675939] mean value: 0.020794439315795898 key: score_time value: [0.02263975 0.015486 0.01317906 0.02044368 0.01301241 0.02103996 0.01305509 0.01366353 0.01381397 0.01300645] mean value: 0.015933990478515625 key: test_mcc value: [0.64613475 0.78405645 0.82213439 0.86758893 0.5169078 0.64613475 0.68911026 0.51089209 0.73320158 0.51185771] mean value: 0.6728018722919369 key: train_mcc value: [0.71373171 0.71391286 0.70403264 0.72839898 0.76296152 0.72399345 0.73363435 0.73363435 0.718529 0.75311563] mean value: 0.7285944480486638 key: test_accuracy value: [0.82222222 0.88888889 0.91111111 0.93333333 0.75555556 0.82222222 0.84444444 0.75555556 0.86666667 0.75555556] mean value: 0.8355555555555555 key: train_accuracy value: [0.85679012 0.85679012 0.85185185 0.86419753 0.88148148 0.8617284 0.86666667 0.86666667 0.85925926 0.87654321] mean value: 0.8641975308641976 key: test_fscore value: [0.80952381 0.87804878 0.90909091 0.93333333 0.76595745 0.83333333 0.85106383 0.76595745 0.86956522 0.75555556] mean value: 0.8371429662120305 key: train_fscore value: [0.85572139 0.855 0.85 0.86486486 0.8817734 0.85858586 0.86432161 0.86432161 0.85925926 0.87562189] mean value: 0.8629469881387253 key: test_precision value: [0.85 0.94736842 0.90909091 0.91304348 0.72 0.8 0.83333333 0.75 0.86956522 0.77272727] mean value: 0.836512863185632 key: train_precision value: [0.86432161 0.8680203 0.86294416 0.8627451 0.8817734 0.87628866 0.87755102 0.87755102 0.85714286 0.88 ] mean value: 0.8708338129852269 key: test_recall value: [0.77272727 0.81818182 0.90909091 0.95454545 0.81818182 0.86956522 0.86956522 0.7826087 0.86956522 0.73913043] mean value: 0.8403162055335969 key: train_recall value: [0.84729064 0.84236453 0.83743842 0.86699507 0.8817734 0.84158416 0.85148515 0.85148515 0.86138614 0.87128713] mean value: 0.8553089791737795 key: test_roc_auc value: [0.82114625 0.88735178 0.91106719 0.93379447 0.756917 0.82114625 0.84387352 0.75494071 0.86660079 0.75592885] mean value: 0.8352766798418972 key: train_roc_auc value: [0.85681364 0.85682583 0.85188753 0.86419061 0.88148076 0.86167878 0.86662927 0.86662927 0.8592645 0.87653026] mean value: 0.8641930449202556 key: test_jcc value: [0.68 0.7826087 0.83333333 0.875 0.62068966 0.71428571 0.74074074 0.62068966 0.76923077 0.60714286] mean value: 0.7243721420730417 key: train_jcc value: [0.74782609 0.74672489 0.73913043 0.76190476 0.78854626 0.75221239 0.76106195 0.76106195 0.75324675 0.77876106] mean value: 0.7590476528359691 MCC on Blind test: 0.73 Accuracy on Blind test: 0.87 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC0... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [4.27274823 1.59566879 1.63563442 1.6189487 1.60230374 1.59007549 1.5283258 1.495116 1.52452278 1.55737209] mean value: 1.8420716047286987 key: score_time value: [0.01275396 0.01314974 0.0131228 0.01300788 0.01266623 0.01288438 0.01313043 0.01349545 0.01266217 0.01410031] mean value: 0.013097333908081054 key: test_mcc value: [0.87406293 0.95643752 1. 0.95643752 0.91485328 0.95643752 0.86732843 0.82213439 0.95643752 0.77865613] mean value: 0.908278523558777 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93333333 0.97777778 1. 0.97777778 0.95555556 0.97777778 0.93333333 0.91111111 0.97777778 0.88888889] mean value: 0.9533333333333334 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92682927 0.97674419 1. 0.97674419 0.95652174 0.9787234 0.93617021 0.91304348 0.9787234 0.88888889] mean value: 0.9532388767942495 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 1. 1. 0.91666667 0.95833333 0.91666667 0.91304348 0.95833333 0.90909091] mean value: 0.9572134387351778 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.86363636 0.95454545 1. 0.95454545 1. 1. 0.95652174 0.91304348 1. 0.86956522] mean value: 0.9511857707509881 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93181818 0.97727273 1. 0.97727273 0.95652174 0.97727273 0.93280632 0.91106719 0.97727273 0.88932806] mean value: 0.9530632411067194 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.86363636 0.95454545 1. 0.95454545 0.91666667 0.95833333 0.88 0.84 0.95833333 0.8 ] mean value: 0.9126060606060606 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.95 Accuracy on Blind test: 0.97 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.05823278 0.07864523 0.08365369 0.06540322 0.09378934 0.07705927 0.10100937 0.11681724 0.09124517 0.08683014] mean value: 0.08526854515075684 key: score_time value: [0.02650523 0.02493405 0.01329923 0.02620196 0.03271937 0.02158356 0.02505922 0.02166438 0.02495146 0.02136326] mean value: 0.02382817268371582 key: test_mcc value: [0.82506438 0.73320158 0.91106719 0.86732843 0.73663511 0.77821935 0.68911026 0.73663511 0.55666994 0.64426877] mean value: 0.7478200124484804 key: train_mcc value: [0.93608359 0.91614635 0.90123397 0.91614635 0.9062683 0.92620337 0.92593586 0.91606106 0.91115718 0.91615248] mean value: 0.9171388520151313 key: test_accuracy value: [0.91111111 0.86666667 0.95555556 0.93333333 0.86666667 0.88888889 0.84444444 0.86666667 0.77777778 0.82222222] mean value: 0.8733333333333333 key: train_accuracy value: [0.96790123 0.95802469 0.95061728 0.95802469 0.95308642 0.96296296 0.96296296 0.95802469 0.95555556 0.95802469] mean value: 0.9585185185185185 key: test_fscore value: [0.9047619 0.86363636 0.95454545 0.93023256 0.86956522 0.89361702 0.85106383 0.86363636 0.79166667 0.82608696] mean value: 0.8748812336363161 key: train_fscore value: [0.96836983 0.95843521 0.95073892 0.95843521 0.95354523 0.96240602 0.96277916 0.95802469 0.95566502 0.95823096] mean value: 0.9586630239446279 key: test_precision value: [0.95 0.86363636 0.95454545 0.95238095 0.83333333 0.875 0.83333333 0.9047619 0.76 0.82608696] mean value: 0.8753078298513082 key: train_precision value: [0.95673077 0.95145631 0.95073892 0.95145631 0.94660194 0.97461929 0.96517413 0.95566502 0.95098039 0.95121951] mean value: 0.9554642596269585 key: test_recall value: [0.86363636 0.86363636 0.95454545 0.90909091 0.90909091 0.91304348 0.86956522 0.82608696 0.82608696 0.82608696] mean value: 0.8760869565217391 key: train_recall value: [0.98029557 0.96551724 0.95073892 0.96551724 0.96059113 0.95049505 0.96039604 0.96039604 0.96039604 0.96534653] mean value: 0.9619689801492465 key: test_roc_auc value: [0.91007905 0.86660079 0.9555336 0.93280632 0.86758893 0.88833992 0.84387352 0.86758893 0.77667984 0.82213439] mean value: 0.8731225296442688 key: train_roc_auc value: [0.96787056 0.95800615 0.95061698 0.95800615 0.95306784 0.96293225 0.96295664 0.95803053 0.95556748 0.95804273] mean value: 0.9585097302833732 key: test_jcc value: [0.82608696 0.76 0.91304348 0.86956522 0.76923077 0.80769231 0.74074074 0.76 0.65517241 0.7037037 ] mean value: 0.7805235587334538 key: train_jcc value: [0.93867925 0.92018779 0.90610329 0.92018779 0.91121495 0.92753623 0.92822967 0.91943128 0.91509434 0.91981132] mean value: 0.9206475908747523 MCC on Blind test: 0.7 Accuracy on Blind test: 0.85 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01486421 0.01032543 0.01009607 0.01081514 0.01028252 0.01089168 0.01108813 0.01036739 0.01119781 0.01117921] mean value: 0.011110758781433106 key: score_time value: [0.011132 0.00922537 0.00892806 0.00906157 0.00955486 0.00952983 0.00965738 0.00968456 0.00975657 0.00976205] mean value: 0.009629225730895996 key: test_mcc value: [0.79670588 0.77821935 0.73663511 0.86732843 0.77865613 0.87406293 0.46930785 0.64426877 0.73320158 0.51185771] mean value: 0.7190243743973286 key: train_mcc value: [0.6994877 0.68482256 0.68960241 0.7385111 0.74355351 0.76791201 0.70964919 0.75845593 0.74835945 0.78285689] mean value: 0.7323210752164393 key: test_accuracy value: [0.88888889 0.88888889 0.86666667 0.93333333 0.88888889 0.93333333 0.73333333 0.82222222 0.86666667 0.75555556] mean value: 0.8577777777777778 key: train_accuracy value: [0.84938272 0.84197531 0.84444444 0.8691358 0.87160494 0.88395062 0.85432099 0.87901235 0.87407407 0.89135802] mean value: 0.865925925925926 key: test_fscore value: [0.87179487 0.88372093 0.86956522 0.93023256 0.88888889 0.93877551 0.72727273 0.82608696 0.86956522 0.75555556] mean value: 0.8561458433392566 key: train_fscore value: [0.84634761 0.83838384 0.84130982 0.86783042 0.87 0.88395062 0.84987277 0.87657431 0.87218045 0.89 ] mean value: 0.8636449842307918 key: test_precision value: [1. 0.9047619 0.83333333 0.95238095 0.86956522 0.88461538 0.76190476 0.82608696 0.86956522 0.77272727] mean value: 0.8674941001027957 key: train_precision value: [0.86597938 0.86010363 0.86082474 0.87878788 0.88324873 0.8817734 0.87434555 0.89230769 0.88324873 0.8989899 ] mean value: 0.8779609631421748 key: test_recall value: [0.77272727 0.86363636 0.90909091 0.90909091 0.90909091 1. 0.69565217 0.82608696 0.86956522 0.73913043] mean value: 0.8494071146245059 key: train_recall value: [0.82758621 0.81773399 0.8226601 0.85714286 0.85714286 0.88613861 0.82673267 0.86138614 0.86138614 0.88118812] mean value: 0.8499097693020533 key: test_roc_auc value: [0.88636364 0.88833992 0.86758893 0.93280632 0.88932806 0.93181818 0.73418972 0.82213439 0.86660079 0.75592885] mean value: 0.857509881422925 key: train_roc_auc value: [0.84943667 0.84203531 0.84449837 0.86916549 0.87164074 0.88395601 0.85425304 0.87896893 0.87404282 0.89133298] mean value: 0.8659330341901185 key: test_jcc value: [0.77272727 0.79166667 0.76923077 0.86956522 0.8 0.88461538 0.57142857 0.7037037 0.76923077 0.60714286] mean value: 0.7539311212137298 key: train_jcc value: [0.73362445 0.72173913 0.72608696 0.76651982 0.7699115 0.7920354 0.73893805 0.78026906 0.77333333 0.8018018 ] mean value: 0.7604259514076851 MCC on Blind test: 0.77 Accuracy on Blind test: 0.88 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01319623 0.02258158 0.01745749 0.02201819 0.02212381 0.02121663 0.02074075 0.01941252 0.02034569 0.02611756] mean value: 0.020521044731140137 key: score_time value: [0.00947046 0.01199651 0.01194668 0.01197267 0.01198769 0.01198697 0.01204848 0.0119555 0.0119822 0.01206636] mean value: 0.011741352081298829 key: test_mcc value: [0.79670588 0.82213439 0.87406293 0.95643752 0.59725988 0.73663511 0.73559956 0.55362003 0.77865613 0.77865613] mean value: 0.7629767557040962 key: train_mcc value: [0.77582446 0.8644041 0.81918005 0.88888095 0.61326848 0.85131769 0.88257176 0.69882885 0.85568499 0.92648542] mean value: 0.8176446738630777 key: test_accuracy value: [0.88888889 0.91111111 0.93333333 0.97777778 0.77777778 0.86666667 0.86666667 0.75555556 0.88888889 0.88888889] mean value: 0.8755555555555555 key: train_accuracy value: [0.87901235 0.9308642 0.90617284 0.94320988 0.77777778 0.92098765 0.94074074 0.82962963 0.92345679 0.96296296] mean value: 0.9014814814814814 key: test_fscore value: [0.87179487 0.90909091 0.92682927 0.97674419 0.80769231 0.86363636 0.875 0.8 0.88888889 0.88888889] mean value: 0.8808565684331424 key: train_fscore value: [0.86501377 0.93364929 0.9 0.94117647 0.81707317 0.9144385 0.94202899 0.85350318 0.91733333 0.96350365] mean value: 0.904772036038694 key: test_precision value: [1. 0.90909091 1. 1. 0.7 0.9047619 0.84 0.6875 0.90909091 0.90909091] mean value: 0.8859534632034631 key: train_precision value: [0.98125 0.89954338 0.96610169 0.9787234 0.69550173 0.99418605 0.91981132 0.7472119 0.99421965 0.94736842] mean value: 0.9123917545678761 key: test_recall value: [0.77272727 0.90909091 0.86363636 0.95454545 0.95454545 0.82608696 0.91304348 0.95652174 0.86956522 0.86956522] mean value: 0.8889328063241106 key: train_recall value: [0.77339901 0.97044335 0.84236453 0.90640394 0.99014778 0.84653465 0.96534653 0.9950495 0.85148515 0.98019802] mean value: 0.9121372482075794 key: test_roc_auc value: [0.88636364 0.91106719 0.93181818 0.97727273 0.78162055 0.86758893 0.86561265 0.75098814 0.88932806 0.88932806] mean value: 0.875098814229249 key: train_roc_auc value: [0.87927376 0.93076623 0.90633078 0.94330098 0.77725211 0.92080427 0.94080135 0.83003707 0.92327952 0.96300541] mean value: 0.9014851485148515 key: test_jcc value: [0.77272727 0.83333333 0.86363636 0.95454545 0.67741935 0.76 0.77777778 0.66666667 0.8 0.8 ] mean value: 0.7906106223525579 key: train_jcc value: [0.76213592 0.87555556 0.81818182 0.88888889 0.69072165 0.84236453 0.89041096 0.74444444 0.84729064 0.92957746] mean value: 0.8289571874991976 MCC on Blind test: 0.81 Accuracy on Blind test: 0.9 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01864982 0.01952457 0.01760387 0.018224 0.01664209 0.01940393 0.01955223 0.03549552 0.02013016 0.0204246 ] mean value: 0.020565080642700195 key: score_time value: [0.01202798 0.01201439 0.012398 0.01194167 0.01194906 0.01242661 0.01245832 0.02830505 0.01237369 0.01226449] mean value: 0.013815927505493163 key: test_mcc value: [0.91452919 0.87406293 0.87476705 0.72645449 0.57868151 0.58158 0.73559956 0.57373395 0.69404997 0.55362003] mean value: 0.7107078686833688 key: train_mcc value: [0.80550226 0.82411192 0.82353111 0.62805778 0.7612786 0.64806439 0.9019476 0.77341987 0.90127552 0.66265175] mean value: 0.7729840794536423 key: test_accuracy value: [0.95555556 0.93333333 0.93333333 0.84444444 0.77777778 0.75555556 0.86666667 0.77777778 0.84444444 0.75555556] mean value: 0.8444444444444444 key: train_accuracy value: [0.8962963 0.90617284 0.90864198 0.78518519 0.87160494 0.79753086 0.95061728 0.87654321 0.95061728 0.80493827] mean value: 0.8748148148148148 key: test_fscore value: [0.95238095 0.92682927 0.93617021 0.8627451 0.8 0.80701754 0.875 0.80769231 0.85714286 0.8 ] mean value: 0.8624978240173622 key: train_fscore value: [0.88648649 0.89784946 0.91415313 0.82281059 0.88444444 0.83057851 0.95145631 0.88888889 0.95024876 0.83643892] mean value: 0.8863355507758013 key: test_precision value: [1. 1. 0.88 0.75862069 0.71428571 0.67647059 0.84 0.72413793 0.80769231 0.6875 ] mean value: 0.8088707230902972 key: train_precision value: [0.98203593 0.98816568 0.86403509 0.70138889 0.80566802 0.71276596 0.93333333 0.80645161 0.955 0.71886121] mean value: 0.8467705715067385 key: test_recall value: [0.90909091 0.86363636 1. 1. 0.90909091 1. 0.91304348 0.91304348 0.91304348 0.95652174] mean value: 0.9377470355731226 key: train_recall value: [0.80788177 0.8226601 0.97044335 0.99507389 0.98029557 0.9950495 0.97029703 0.99009901 0.94554455 1. ] mean value: 0.9477344778812856 key: test_roc_auc value: [0.95454545 0.93181818 0.93478261 0.84782609 0.78063241 0.75 0.86561265 0.77470356 0.84288538 0.75098814] mean value: 0.8433794466403162 key: train_roc_auc value: [0.89651514 0.90637955 0.908489 0.78466566 0.8713359 0.79801736 0.95066576 0.8768229 0.95060479 0.80541872] mean value: 0.8748914792957128 key: test_jcc value: [0.90909091 0.86363636 0.88 0.75862069 0.66666667 0.67647059 0.77777778 0.67741935 0.75 0.66666667] mean value: 0.762634901656756 key: train_jcc value: [0.7961165 0.81463415 0.84188034 0.69896194 0.79282869 0.71024735 0.90740741 0.8 0.90521327 0.71886121] mean value: 0.7986150853388724 MCC on Blind test: 0.75 Accuracy on Blind test: 0.87 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.18614674 0.18338513 0.18672371 0.18425417 0.18633842 0.17906499 0.16986871 0.1727941 0.16615391 0.17229795] mean value: 0.17870278358459474 key: score_time value: [0.01630163 0.0177443 0.01695061 0.01684165 0.01702309 0.01606345 0.01537633 0.01550055 0.015517 0.01609135] mean value: 0.016340994834899904 key: test_mcc value: [0.91452919 0.91452919 1. 0.91106719 0.91485328 0.91452919 0.91106719 0.86732843 0.95643752 0.77865613] mean value: 0.9082997309622647 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95555556 0.95555556 1. 0.95555556 0.95555556 0.95555556 0.95555556 0.93333333 0.97777778 0.88888889] mean value: 0.9533333333333334 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95238095 0.95238095 1. 0.95454545 0.95652174 0.95833333 0.95652174 0.93617021 0.9787234 0.88888889] mean value: 0.9534466676811728 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 1. 0.95454545 0.91666667 0.92 0.95652174 0.91666667 0.95833333 0.90909091] mean value: 0.9531824769433466 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.90909091 1. 0.95454545 1. 1. 0.95652174 0.95652174 1. 0.86956522] mean value: 0.9555335968379447 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95454545 0.95454545 1. 0.9555336 0.95652174 0.95454545 0.9555336 0.93280632 0.97727273 0.88932806] mean value: 0.9530632411067194 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.90909091 0.90909091 1. 0.91304348 0.91666667 0.92 0.91666667 0.88 0.95833333 0.8 ] mean value: 0.9122891963109354 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.93 Accuracy on Blind test: 0.96 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.0584321 0.06014061 0.07051253 0.0679574 0.06842804 0.0796442 0.0581665 0.08605957 0.07957792 0.07208705] mean value: 0.07010059356689453 key: score_time value: [0.03084612 0.02727365 0.02405024 0.02450156 0.03062487 0.02380657 0.0292778 0.04444814 0.02564001 0.03785896] mean value: 0.029832792282104493 key: test_mcc value: [0.91452919 0.95643752 0.95643752 1. 0.86758893 0.95643752 0.91452919 0.86732843 0.95643752 0.77865613] mean value: 0.9168381944162244 key: train_mcc value: [0.98519729 0.98029509 0.99507377 0.9901234 0.98029509 0.97532008 0.98024679 0.99507377 0.9704168 0.98024679] mean value: 0.9832288871744899 key: test_accuracy value: [0.95555556 0.97777778 0.97777778 1. 0.93333333 0.97777778 0.95555556 0.93333333 0.97777778 0.88888889] mean value: 0.9577777777777777 key: train_accuracy value: [0.99259259 0.99012346 0.99753086 0.99506173 0.99012346 0.98765432 0.99012346 0.99753086 0.98518519 0.99012346] mean value: 0.9916049382716049 key: test_fscore value: [0.95238095 0.97674419 0.97674419 1. 0.93333333 0.9787234 0.95833333 0.93617021 0.9787234 0.88888889] mean value: 0.9580041901306127 key: train_fscore value: [0.99259259 0.99009901 0.997543 0.99507389 0.99009901 0.98759305 0.99009901 0.99751861 0.98507463 0.99009901] mean value: 0.9915791810761856 key: test_precision value: [1. 1. 1. 1. 0.91304348 0.95833333 0.92 0.91666667 0.95833333 0.90909091] mean value: 0.9575467720685112 key: train_precision value: [0.9950495 0.99502488 0.99509804 0.99507389 0.99502488 0.99004975 0.99009901 1. 0.99 0.99009901] mean value: 0.993551895808134 key: test_recall value: [0.90909091 0.95454545 0.95454545 1. 0.95454545 1. 1. 0.95652174 1. 0.86956522] mean value: 0.9598814229249012 key: train_recall value: [0.99014778 0.98522167 1. 0.99507389 0.98522167 0.98514851 0.99009901 0.9950495 0.98019802 0.99009901] mean value: 0.9896259084036483 key: test_roc_auc value: [0.95454545 0.97727273 0.97727273 1. 0.93379447 0.97727273 0.95454545 0.93280632 0.97727273 0.88932806] mean value: 0.9574110671936759 key: train_roc_auc value: [0.99259864 0.99013559 0.99752475 0.9950617 0.99013559 0.98764815 0.9901234 0.99752475 0.9851729 0.9901234 ] mean value: 0.9916048870896942 key: test_jcc value: [0.90909091 0.95454545 0.95454545 1. 0.875 0.95833333 0.92 0.88 0.95833333 0.8 ] mean value: 0.9209848484848485 key: train_jcc value: [0.98529412 0.98039216 0.99509804 0.99019608 0.98039216 0.9754902 0.98039216 0.9950495 0.97058824 0.98039216] mean value: 0.9833284799068142 MCC on Blind test: 0.93 Accuracy on Blind test: 0.96 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.11871982 0.17861295 0.23517871 0.15984797 0.21449161 0.23632669 0.18187809 0.18064666 0.17393255 0.1803596 ] mean value: 0.1859994649887085 key: score_time value: [0.02222514 0.02561641 0.023772 0.02483273 0.02410769 0.02581143 0.02375984 0.02371955 0.02334571 0.03471303] mean value: 0.025190353393554688 key: test_mcc value: [0.65335861 0.73320158 0.77821935 0.86758893 0.46720513 0.38019877 0.65604724 0.46640316 0.73559956 0.42744299] mean value: 0.616526531713881 key: train_mcc value: [0.98529376 0.98529376 0.99017193 0.99017193 0.99507389 0.98529269 0.99017145 1. 0.98529269 0.98529269] mean value: 0.9892054809931761 key: test_accuracy value: [0.82222222 0.86666667 0.88888889 0.93333333 0.73333333 0.68888889 0.82222222 0.73333333 0.86666667 0.71111111] mean value: 0.8066666666666666 key: train_accuracy value: [0.99259259 0.99259259 0.99506173 0.99506173 0.99753086 0.99259259 0.99506173 1. 0.99259259 0.99259259] mean value: 0.9945679012345678 key: test_fscore value: [0.8 0.86363636 0.88372093 0.93333333 0.71428571 0.68181818 0.80952381 0.73913043 0.875 0.69767442] mean value: 0.7998123186217221 key: train_fscore value: [0.99255583 0.99255583 0.9950495 0.9950495 0.99753086 0.9925187 0.99502488 1. 0.9925187 0.9925187 ] mean value: 0.9945322521977115 key: test_precision value: [0.88888889 0.86363636 0.9047619 0.91304348 0.75 0.71428571 0.89473684 0.73913043 0.84 0.75 ] mean value: 0.8258483626721613 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 0.86363636 0.86363636 0.95454545 0.68181818 0.65217391 0.73913043 0.73913043 0.91304348 0.65217391] mean value: 0.7786561264822134 key: train_recall value: [0.98522167 0.98522167 0.99014778 0.99014778 0.99507389 0.98514851 0.99009901 1. 0.98514851 0.98514851] mean value: 0.9891357362337219 key: test_roc_auc value: [0.8201581 0.86660079 0.88833992 0.93379447 0.73221344 0.68972332 0.82411067 0.73320158 0.86561265 0.71245059] mean value: 0.8066205533596839 key: train_roc_auc value: [0.99261084 0.99261084 0.99507389 0.99507389 0.99753695 0.99257426 0.9950495 1. 0.99257426 0.99257426] mean value: 0.9945678681168609 key: test_jcc value: [0.66666667 0.76 0.79166667 0.875 0.55555556 0.51724138 0.68 0.5862069 0.77777778 0.53571429] mean value: 0.6745829228243021 key: train_jcc value: [0.98522167 0.98522167 0.99014778 0.99014778 0.99507389 0.98514851 0.99009901 1. 0.98514851 0.98514851] mean value: 0.9891357362337219 MCC on Blind test: 0.62 Accuracy on Blind test: 0.81 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.67379546 0.64568853 0.6526382 0.67437029 0.65233517 0.66356587 0.65809655 0.66884041 0.65953517 0.64904428] mean value: 0.6597909927368164 key: score_time value: [0.00955462 0.00969028 0.0093205 0.0094049 0.00958753 0.00942659 0.00944066 0.01032162 0.00953197 0.00981259] mean value: 0.009609127044677734 key: test_mcc value: [0.91452919 0.95643752 0.95643752 1. 0.91485328 0.91452919 0.86732843 0.82506438 0.78530224 0.82213439] mean value: 0.8956616127817072 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95555556 0.97777778 0.97777778 1. 0.95555556 0.95555556 0.93333333 0.91111111 0.88888889 0.91111111] mean value: 0.9466666666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95238095 0.97674419 0.97674419 1. 0.95652174 0.95833333 0.93617021 0.91666667 0.88372093 0.91304348] mean value: 0.9470325684863796 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 1. 1. 0.91666667 0.92 0.91666667 0.88 0.95 0.91304348] mean value: 0.9496376811594203 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.95454545 0.95454545 1. 1. 1. 0.95652174 0.95652174 0.82608696 0.91304348] mean value: 0.9470355731225296 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95454545 0.97727273 0.97727273 1. 0.95652174 0.95454545 0.93280632 0.91007905 0.89031621 0.91106719] mean value: 0.9464426877470355 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.90909091 0.95454545 0.95454545 1. 0.91666667 0.92 0.88 0.84615385 0.79166667 0.84 ] mean value: 0.9012668997668998 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.96 Accuracy on Blind test: 0.98 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03017378 0.05092001 0.03264356 0.0517385 0.03246832 0.07408309 0.03232288 0.03246665 0.05935621 0.03307438] mean value: 0.04292473793029785 key: score_time value: [0.02918839 0.02444148 0.01487088 0.01399708 0.02390194 0.01415896 0.0149672 0.01498842 0.02188349 0.01332688] mean value: 0.018572473526000978 key: test_mcc value: [0.5216284 0.46720513 0.51089209 0.43557241 0.38112585 0.55841694 0.19960474 0.44784269 0.2903816 0.46640316] mean value: 0.4279073012043456 key: train_mcc value: [0.77727216 0.81448302 0.98519729 0.95177249 0.878915 0.8700435 0.94707011 0.94707011 0.96124772 0.98519693] mean value: 0.9118268331436081 key: test_accuracy value: [0.75555556 0.73333333 0.75555556 0.71111111 0.68888889 0.77777778 0.6 0.71111111 0.64444444 0.73333333] mean value: 0.7111111111111111 key: train_accuracy value: [0.87654321 0.89876543 0.99259259 0.97530864 0.93580247 0.9308642 0.97283951 0.97283951 0.98024691 0.99259259] mean value: 0.9528395061728395 key: test_fscore value: [0.71794872 0.71428571 0.74418605 0.73469388 0.65 0.77272727 0.60869565 0.66666667 0.68 0.73913043] mean value: 0.7028334382647542 key: train_fscore value: [0.85955056 0.88767123 0.99259259 0.97596154 0.93157895 0.92553191 0.97201018 0.97201018 0.98058252 0.99255583] mean value: 0.9490045499762084 key: test_precision value: [0.82352941 0.75 0.76190476 0.66666667 0.72222222 0.80952381 0.60869565 0.8125 0.62962963 0.73913043] mean value: 0.7323802588668318 key: train_precision value: [1. 1. 0.9950495 0.95305164 1. 1. 1. 1. 0.96190476 0.99502488] mean value: 0.9905030785669636 key: test_recall value: [0.63636364 0.68181818 0.72727273 0.81818182 0.59090909 0.73913043 0.60869565 0.56521739 0.73913043 0.73913043] mean value: 0.6845849802371542 key: train_recall value: [0.75369458 0.79802956 0.99014778 1. 0.87192118 0.86138614 0.94554455 0.94554455 1. 0.99009901] mean value: 0.9156367360874018 key: test_roc_auc value: [0.75296443 0.73221344 0.75494071 0.71343874 0.68675889 0.77865613 0.59980237 0.71442688 0.64229249 0.73320158] mean value: 0.7108695652173913 key: train_roc_auc value: [0.87684729 0.89901478 0.99259864 0.97524752 0.93596059 0.93069307 0.97277228 0.97277228 0.98029557 0.99258645] mean value: 0.9528788469980003 key: test_jcc value: [0.56 0.55555556 0.59259259 0.58064516 0.48148148 0.62962963 0.4375 0.5 0.51515152 0.5862069 ] mean value: 0.5438762832252821 key: train_jcc value: [0.75369458 0.79802956 0.98529412 0.95305164 0.87192118 0.86138614 0.94554455 0.94554455 0.96190476 0.98522167] mean value: 0.9061592765342953 MCC on Blind test: 0.54 Accuracy on Blind test: 0.77 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.0267837 0.04602504 0.04733205 0.03782296 0.05901909 0.02543259 0.04678321 0.01800036 0.06643128 0.03821254] mean value: 0.041184282302856444 key: score_time value: [0.02365017 0.02335858 0.03110051 0.02066255 0.0384798 0.02338219 0.01257777 0.02225137 0.02346373 0.02367878] mean value: 0.024260544776916505 key: test_mcc value: [0.91452919 0.95643752 0.91106719 0.91106719 0.77865613 0.82506438 0.73559956 0.64752602 0.70501339 0.64426877] mean value: 0.802922934636812 key: train_mcc value: [0.85762118 0.86692207 0.84700001 0.85704185 0.86692207 0.87160416 0.87199635 0.86211613 0.88643125 0.88165855] mean value: 0.8669313605558044 key: test_accuracy value: [0.95555556 0.97777778 0.95555556 0.95555556 0.88888889 0.91111111 0.86666667 0.82222222 0.84444444 0.82222222] mean value: 0.9 key: train_accuracy value: [0.92839506 0.93333333 0.92345679 0.92839506 0.93333333 0.93580247 0.93580247 0.9308642 0.94320988 0.94074074] mean value: 0.9333333333333333 key: test_fscore value: [0.95238095 0.97674419 0.95454545 0.95454545 0.88888889 0.91666667 0.875 0.81818182 0.8627451 0.82608696] mean value: 0.9025785475816701 key: train_fscore value: [0.93012048 0.93430657 0.92420538 0.92944039 0.93430657 0.93564356 0.93658537 0.93170732 0.94320988 0.94117647] mean value: 0.9340701983296061 key: test_precision value: [1. 1. 0.95454545 0.95454545 0.86956522 0.88 0.84 0.85714286 0.78571429 0.82608696] mean value: 0.8967600225861095 key: train_precision value: [0.91037736 0.92307692 0.91747573 0.91826923 0.92307692 0.93564356 0.92307692 0.91826923 0.9408867 0.93203883] mean value: 0.9242191416230418 key: test_recall value: [0.90909091 0.95454545 0.95454545 0.95454545 0.90909091 0.95652174 0.91304348 0.7826087 0.95652174 0.82608696] mean value: 0.9116600790513834 key: train_recall value: [0.95073892 0.94581281 0.93103448 0.9408867 0.94581281 0.93564356 0.95049505 0.94554455 0.94554455 0.95049505] mean value: 0.9442008486562942 key: test_roc_auc value: [0.95454545 0.97727273 0.9555336 0.9555336 0.88932806 0.91007905 0.86561265 0.82312253 0.84189723 0.82213439] mean value: 0.8995059288537549 key: train_roc_auc value: [0.92833976 0.93330244 0.92343803 0.92836414 0.93330244 0.93580208 0.93583866 0.93090036 0.94321563 0.94076477] mean value: 0.9333268302199678 key: test_jcc value: [0.90909091 0.95454545 0.91304348 0.91304348 0.8 0.84615385 0.77777778 0.69230769 0.75862069 0.7037037 ] mean value: 0.8268287029756295 key: train_jcc value: [0.86936937 0.87671233 0.85909091 0.86818182 0.87671233 0.87906977 0.88073394 0.87214612 0.89252336 0.88888889] mean value: 0.8763428838668663 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'mcsm_ppi2_affinity', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=169)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.47064686 0.37768698 0.39077091 0.46480179 0.4833498 0.96630669 0.25368643 0.32124949 0.53351617 0.37344742] mean value: 0.46354625225067136 key: score_time value: [0.03063655 0.02306867 0.02072167 0.02873063 0.02459288 0.01240277 0.01261353 0.03182149 0.0360291 0.02511287] mean value: 0.02457301616668701 key: test_mcc value: [0.83484711 0.95643752 0.91106719 0.91106719 0.77865613 0.82506438 0.68911026 0.64752602 0.70501339 0.64426877] mean value: 0.7903057965349736 key: train_mcc value: [0.79798935 0.86692207 0.84700001 0.85704185 0.90127552 0.87160416 0.92098717 0.86211613 0.88643125 0.88165855] mean value: 0.869302605153845 key: test_accuracy value: [0.91111111 0.97777778 0.95555556 0.95555556 0.88888889 0.91111111 0.84444444 0.82222222 0.84444444 0.82222222] mean value: 0.8933333333333333 key: train_accuracy value: [0.89876543 0.93333333 0.92345679 0.92839506 0.95061728 0.93580247 0.96049383 0.9308642 0.94320988 0.94074074] mean value: 0.9345679012345679 key: test_fscore value: [0.9 0.97674419 0.95454545 0.95454545 0.88888889 0.91666667 0.85106383 0.81818182 0.8627451 0.82608696] mean value: 0.8949468353222984 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:188: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./rpob_8020.py:191: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.90072639 0.93430657 0.92420538 0.92944039 0.95098039 0.93564356 0.96039604 0.93170732 0.94320988 0.94117647] mean value: 0.9351792390184265 key: test_precision value: [1. 1. 0.95454545 0.95454545 0.86956522 0.88 0.83333333 0.85714286 0.78571429 0.82608696] mean value: 0.8960933559194428 key: train_precision value: [0.88571429 0.92307692 0.91747573 0.91826923 0.94634146 0.93564356 0.96039604 0.91826923 0.9408867 0.93203883] mean value: 0.9278112000318885 key: test_recall value: [0.81818182 0.95454545 0.95454545 0.95454545 0.90909091 0.95652174 0.86956522 0.7826087 0.95652174 0.82608696] mean value: 0.8982213438735178 key: train_recall value: [0.91625616 0.94581281 0.93103448 0.9408867 0.95566502 0.93564356 0.96039604 0.94554455 0.94554455 0.95049505] mean value: 0.9427278934790031 key: test_roc_auc value: [0.90909091 0.97727273 0.9555336 0.9555336 0.88932806 0.91007905 0.84387352 0.82312253 0.84189723 0.82213439] mean value: 0.8927865612648221 key: train_roc_auc value: [0.89872214 0.93330244 0.92343803 0.92836414 0.95060479 0.93580208 0.96049359 0.93090036 0.94321563 0.94076477] mean value: 0.9345607959810759 key: test_jcc value: [0.81818182 0.95454545 0.91304348 0.91304348 0.8 0.84615385 0.74074074 0.69230769 0.75862069 0.7037037 ] mean value: 0.8140340901810167 key: train_jcc value: [0.81938326 0.87671233 0.85909091 0.86818182 0.90654206 0.87906977 0.92380952 0.87214612 0.89252336 0.88888889] mean value: 0.8786348035374227 MCC on Blind test: 0.79 Accuracy on Blind test: 0.89