/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_8020.py:549: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 858 PASS: my_features_df and aa_df successfully combined nrows: 858 ncols: 269 count of NULL values before imputation or_mychisq 244 log10_or_mychisq 244 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML Total no. of features for aaindex: 123 No. of numerical features: 168 No. of categorical features: 7 PASS: x_features has no target variable No. of columns for x_features: 175 ------------------------------------------------------------- Successfully split data with stratification: 80/20 Train data size: (358, 175) Test data size: (90, 175) y_train numbers: Counter({0: 282, 1: 76}) y_train ratio: 3.710526315789474 y_test_numbers: Counter({0: 71, 1: 19}) y_test ratio: 3.736842105263158 ------------------------------------------------------------- Simple Random OverSampling Counter({0: 282, 1: 282}) (564, 175) Simple Random UnderSampling Counter({0: 76, 1: 76}) (152, 175) Simple Combined Over and UnderSampling Counter({0: 282, 1: 282}) (564, 175) SMOTE_NC OverSampling Counter({0: 282, 1: 282}) (564, 175) ##################################################################### Running ML analysis: 80/20 split Gene name: embB Drug name: ethambutol Output directory: /home/tanu/git/Data/ethambutol/output/ml/tts_8020/ Sanity checks: ML source data size: (448, 175) Total input features: (358, 175) Target feature numbers: Counter({0: 282, 1: 76}) Target features ratio: 3.710526315789474 ##################################################################### ================================================================ Strucutral features (n): 36 These are: Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist'] FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss'] Other struc columns: ['rsa', 'kd_values', 'rd_values'] ================================================================ AAindex features (n): 123 ================================================================ Evolutionary features (n): 3 These are: ['consurf_score', 'snap2_score', 'provean_score'] ================================================================ Genomic features (n): 6 These are: ['maf', 'logorI'] ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'] ================================================================ Categorical features (n): 7 These are: ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'] ================================================================ Pass: No. of features match ##################################################################### Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.0428791 0.04306722 0.03533816 0.03511214 0.036165 0.04105043 0.03672242 0.0365274 0.03594804 0.03573561] mean value: 0.037854552268981934 key: score_time value: [0.0125525 0.01230192 0.0132637 0.01333547 0.01366806 0.01352763 0.0132668 0.01345658 0.01344943 0.01346922] mean value: 0.013229131698608398 key: test_mcc value: [0.8174367 0.49365725 0.44883281 0.75134288 0.51785714 0.45374261 0.51785714 0.67857143 0.71842121 0.72019314] mean value: 0.6117912318227132 key: train_mcc value: [0.79905267 0.83859776 0.82668723 0.78613568 0.81652347 0.8365424 0.79631634 0.84662994 0.82882139 0.80889737] mean value: 0.818420423682095 key: test_accuracy value: [0.94444444 0.86111111 0.83333333 0.91666667 0.83333333 0.83333333 0.83333333 0.88888889 0.91428571 0.91428571] mean value: 0.8773015873015872 key: train_accuracy value: [0.93478261 0.94720497 0.94409938 0.93167702 0.94099379 0.94720497 0.93478261 0.95031056 0.94427245 0.9380805 ] mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( 0.9413408841797588 key: test_fscore value: [0.83333333 0.44444444 0.5 0.76923077 0.625 0.4 0.625 0.75 0.72727273 0.76923077] mean value: 0.6443512043512043 key: train_fscore value: [0.83464567 0.86821705 0.85714286 0.81666667 0.84552846 0.864 0.82644628 0.87096774 0.859375 0.84126984] mean value: 0.8484259566846042 key: test_precision value: [1. 1. 0.75 1. 0.625 1. 0.625 0.75 1. 0.83333333] mean value: 0.8583333333333334 key: train_precision value: [0.9137931 0.93333333 0.93103448 0.94230769 0.94545455 0.94736842 0.94339623 0.96428571 0.93220339 0.92982456] mean value: 0.9383001470289924 key: test_recall value: [0.71428571 0.28571429 0.375 0.625 0.625 0.25 0.625 0.75 0.57142857 0.71428571] mean value: 0.5535714285714286 key: train_recall value: [0.76811594 0.8115942 0.79411765 0.72058824 0.76470588 0.79411765 0.73529412 0.79411765 0.79710145 0.76811594] mean value: 0.7747868712702473 key: test_roc_auc value: [0.85714286 0.64285714 0.66964286 0.8125 0.75892857 0.625 0.75892857 0.83928571 0.78571429 0.83928571] mean value: 0.7589285714285714 key: train_roc_auc value: [0.87417655 0.89789196 0.88918481 0.85438861 0.87644743 0.89115331 0.86174155 0.89312182 0.89067671 0.87618396] mean value: 0.8804966692724209 key: test_jcc value: [0.71428571 0.28571429 0.33333333 0.625 0.45454545 0.25 0.45454545 0.6 0.57142857 0.625 ] mean value: 0.49138528138528137 key: train_jcc value: [0.71621622 0.76712329 0.75 0.69014085 0.73239437 0.76056338 0.70422535 0.77142857 0.75342466 0.7260274 ] mean value: 0.7371544073772512 MCC on Blind test: 0.76 Accuracy on Blind test: 0.92 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.90058208 0.80069208 0.91843367 0.97248459 0.82589054 0.95298767 0.91078734 0.78506351 1.01478314 0.77273583] mean value: 0.8854440450668335 key: score_time value: [0.0133543 0.01355958 0.0135107 0.01562595 0.01443386 0.01595211 0.01460838 0.01361895 0.01384115 0.01354361] mean value: 0.014204859733581543 key: test_mcc value: [0.75032247 0.61369649 0.65737574 0.75134288 0.41267736 0.45374261 0.67857143 0.77151675 0.61237244 0.61237244] mean value: 0.6313990594842422 key: train_mcc value: [0.85829157 0.96310935 0.98135711 0.96271422 0.97192696 0.89565519 0.96271422 1. 0.89735962 0.96314048] mean value: 0.9456268716050406 key: test_accuracy value: [0.91666667 0.88888889 0.88888889 0.91666667 0.80555556 0.83333333 0.88888889 0.91666667 0.88571429 0.88571429] mean value: 0.8826984126984126 key: train_accuracy value: [0.95341615 0.98757764 0.99378882 0.98757764 0.99068323 0.96583851 0.98757764 1. 0.96594427 0.9876161 ] mean value: 0.982001999884622 key: test_fscore value: [0.8 0.6 0.71428571 0.76923077 0.53333333 0.4 0.75 0.82352941 0.6 0.66666667] mean value: 0.665704589528119 key: train_fscore value: [0.88549618 0.97101449 0.98529412 0.97058824 0.97777778 0.91603053 0.97058824 1. 0.91851852 0.97101449] mean value: 0.9566322587596089 key: test_precision value: [0.75 1. 0.83333333 1. 0.57142857 1. 0.75 0.77777778 1. 0.8 ] mean value: 0.8482539682539683 key: train_precision value: [0.93548387 0.97101449 0.98529412 0.97058824 0.98507463 0.95238095 0.97058824 1. 0.93939394 0.97101449] mean value: 0.9680832963350846 key: test_recall value: [0.85714286 0.42857143 0.625 0.625 0.5 0.25 0.75 0.875 0.42857143 0.57142857] mean value: 0.5910714285714286 key: train_recall value: [0.84057971 0.97101449 0.98529412 0.97058824 0.97058824 0.88235294 0.97058824 1. 0.89855072 0.97101449] mean value: 0.9460571184995737 key: test_roc_auc value: [0.89408867 0.71428571 0.79464286 0.8125 0.69642857 0.625 0.83928571 0.90178571 0.71428571 0.76785714] mean value: 0.7760160098522167 key: train_roc_auc value: [0.91238472 0.98155468 0.99067855 0.98135711 0.98332561 0.93527096 0.98135711 1. 0.94140135 0.98157024] mean value: 0.968890032593287 key: test_jcc value: [0.66666667 0.42857143 0.55555556 0.625 0.36363636 0.25 0.6 0.7 0.42857143 0.5 ] mean value: 0.5118001443001443 key: train_jcc value: [0.79452055 0.94366197 0.97101449 0.94285714 0.95652174 0.84507042 0.94285714 1. 0.84931507 0.94366197] mean value: 0.9189480500233883 MCC on Blind test: 0.8 Accuracy on Blind test: 0.93 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.0145061 0.01411271 0.00958824 0.00958681 0.01035023 0.00946975 0.00950241 0.01032686 0.01084828 0.00958943] mean value: 0.01078808307647705 key: score_time value: [0.01501393 0.01024151 0.00904894 0.00896454 0.00931573 0.00904608 0.00889969 0.00926065 0.0098412 0.00942135] mean value: 0.009905362129211425 key: test_mcc value: [0.40804713 0.34527065 0.5157267 0.6172134 0.3086067 0.2438548 0.29366622 0.40089186 0.40147753 0.68640647] mean value: 0.4221161472620327 key: train_mcc value: [0.62471066 0.5915192 0.67760901 0.65520113 0.64418833 0.66117244 0.66633852 0.52011895 0.60022186 0.64725803] mean value: 0.62883381442086 key: test_accuracy value: [0.69444444 0.80555556 0.80555556 0.86111111 0.75 0.75 0.69444444 0.66666667 0.68571429 0.88571429] mean value: 0.7599206349206349 key: train_accuracy value: [0.85093168 0.83850932 0.87267081 0.86335404 0.8757764 0.86956522 0.86645963 0.72981366 0.79566563 0.85758514] mean value: 0.8420331519335423 key: test_fscore value: [0.52173913 0.46153846 0.63157895 0.70588235 0.47058824 0.4 0.47619048 0.53846154 0.52173913 0.75 ] mean value: 0.5477718272663756 key: train_fscore value: [0.70731707 0.68292683 0.74534161 0.72839506 0.72222222 0.73417722 0.73619632 0.60273973 0.67 0.72289157] mean value: 0.705220762779721 key: test_precision value: [0.375 0.5 0.54545455 0.66666667 0.44444444 0.42857143 0.38461538 0.38888889 0.375 0.66666667] mean value: 0.4775308025308025 key: train_precision value: [0.61052632 0.58947368 0.64516129 0.62765957 0.68421053 0.64444444 0.63157895 0.43708609 0.51145038 0.6185567 ] mean value: 0.6000147958344869 key: test_recall value: [0.85714286 0.42857143 0.75 0.75 0.5 0.375 0.625 0.875 0.85714286 0.85714286] mean value: 0.6875 key: train_recall value: [0.84057971 0.8115942 0.88235294 0.86764706 0.76470588 0.85294118 0.88235294 0.97058824 0.97101449 0.86956522] mean value: 0.8713341858482523 key: test_roc_auc value: [0.75615764 0.66256158 0.78571429 0.82142857 0.66071429 0.61607143 0.66964286 0.74107143 0.75 0.875 ] mean value: 0.7338362068965517 key: train_roc_auc value: [0.84716733 0.828722 0.87621584 0.86492589 0.83510885 0.86347846 0.87227883 0.81797128 0.85952299 0.86194796] mean value: 0.8527339442515047 key: test_jcc value: [0.35294118 0.3 0.46153846 0.54545455 0.30769231 0.25 0.3125 0.36842105 0.35294118 0.6 ] mean value: 0.385148872025807 key: train_jcc value: [0.54716981 0.51851852 0.59405941 0.57281553 0.56521739 0.58 0.58252427 0.43137255 0.5037594 0.56603774] mean value: 0.5461474616274362 MCC on Blind test: 0.53 Accuracy on Blind test: 0.82 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01099086 0.01003337 0.010741 0.01009178 0.01001143 0.01029563 0.01055908 0.01037931 0.00990558 0.00981903] mean value: 0.01028270721435547 key: score_time value: [0.00966334 0.00921583 0.00926971 0.009969 0.00917459 0.00983572 0.00905895 0.0090332 0.00921869 0.00948954] mean value: 0.009392857551574707 key: test_mcc value: [ 0.75032247 0.2085873 0.16205093 0.47809144 0.58149992 -0.18898224 0.35714286 0.0805823 0.49391458 0.10206207] mean value: 0.30252716321584283 key: train_mcc value: [0.39769343 0.45897008 0.43461577 0.4144431 0.49728141 0.50633817 0.48412839 0.39761525 0.41442016 0.42938548] mean value: 0.4434891238000965 key: test_accuracy value: [0.91666667 0.77777778 0.77777778 0.83333333 0.86111111 0.66666667 0.77777778 0.75 0.85714286 0.77142857] mean value: 0.798968253968254 key: train_accuracy value: [0.81677019 0.83850932 0.82608696 0.82608696 0.84782609 0.85093168 0.84161491 0.81987578 0.82352941 0.82352941] mean value: 0.8314760686883449 key: test_fscore value: [0.8 0.33333333 0.2 0.57142857 0.66666667 0. 0.5 0.18181818 0.54545455 0.2 ] mean value: 0.3998701298701299 key: train_fscore value: [0.4957265 0.52727273 0.53333333 0.5 0.57391304 0.57894737 0.57142857 0.49122807 0.50434783 0.52892562] mean value: 0.5305123055757547 key: test_precision value: [0.75 0.4 0.5 0.66666667 0.71428571 0. 0.5 0.33333333 0.75 0.33333333] mean value: 0.49476190476190474 key: train_precision value: [0.60416667 0.70731707 0.61538462 0.63636364 0.70212766 0.7173913 0.66666667 0.60869565 0.63043478 0.61538462] mean value: 0.6503932672341834 key: test_recall value: [0.85714286 0.28571429 0.125 0.5 0.625 0. 0.5 0.125 0.42857143 0.14285714] mean value: 0.35892857142857143 key: train_recall value: [0.42028986 0.42028986 0.47058824 0.41176471 0.48529412 0.48529412 0.5 0.41176471 0.42028986 0.46376812] mean value: 0.44893435635123613 key: test_roc_auc value: [0.89408867 0.591133 0.54464286 0.71428571 0.77678571 0.42857143 0.67857143 0.52678571 0.69642857 0.53571429] mean value: 0.6387007389162562 key: train_roc_auc value: [0.67259552 0.68642951 0.69592404 0.67438629 0.715088 0.71705651 0.71653543 0.67044928 0.67668036 0.69251398] mean value: 0.6917658928125731 key: test_jcc value: [0.66666667 0.2 0.11111111 0.4 0.5 0. 0.33333333 0.1 0.375 0.11111111] mean value: 0.2797222222222222 key: train_jcc value: [0.32954545 0.35802469 0.36363636 0.33333333 0.40243902 0.40740741 0.4 0.3255814 0.3372093 0.35955056] mean value: 0.3616727534142999 MCC on Blind test: 0.38 Accuracy on Blind test: 0.81 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00918794 0.0107317 0.01008606 0.0098629 0.00994635 0.00929689 0.01015115 0.01002431 0.01017118 0.01010108] mean value: 0.009955954551696778 key: score_time value: [0.08292389 0.01307893 0.01189065 0.01137209 0.01231623 0.0125916 0.01275134 0.01540232 0.01305771 0.01362872] mean value: 0.019901347160339356 key: test_mcc value: [-0.08304548 0.34404556 -0.12964074 0.45374261 0.45374261 -0.09035079 0.32232919 0.31622777 -0.08574929 -0.08574929] mean value: 0.1415552123996879 key: train_mcc value: [0.49666776 0.39869846 0.50648694 0.43431192 0.43289908 0.46108514 0.37190677 0.38709528 0.47029901 0.47029901] mean value: 0.4429749369850348 key: test_accuracy value: [0.77777778 0.83333333 0.72222222 0.83333333 0.83333333 0.75 0.80555556 0.80555556 0.77142857 0.77142857] mean value: 0.7903968253968254 key: train_accuracy value: [0.85093168 0.82919255 0.85403727 0.83850932 0.83850932 0.8447205 0.82608696 0.82919255 0.84520124 0.84520124] mean value: 0.8401582601003789 key: test_fscore value: [0. 0.25 0. 0.4 0.4 0. 0.36363636 0.22222222 0. 0. ] mean value: 0.16358585858585858 key: train_fscore value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) [0.51020408 0.38202247 0.48351648 0.40909091 0.42222222 0.46808511 0.33333333 0.36781609 0.47916667 0.47916667] mean value: 0.4334624033376049 key: test_precision value: [0. 1. 0. 1. 1. 0. 0.66666667 1. 0. 0. ] mean value: 0.4666666666666667 key: train_precision value: [0.86206897 0.85 0.95652174 0.9 0.86363636 0.84615385 0.875 0.84210526 0.85185185 0.85185185] mean value: 0.8699189881299484 key: test_recall value: [0. 0.14285714 0. 0.25 0.25 0. 0.25 0.125 0. 0. ] mean value: 0.10178571428571428 key: train_recall value: [0.36231884 0.24637681 0.32352941 0.26470588 0.27941176 0.32352941 0.20588235 0.23529412 0.33333333 0.33333333] mean value: 0.290771526001705 key: test_roc_auc value: [0.48275862 0.57142857 0.46428571 0.625 0.625 0.48214286 0.60714286 0.5625 0.48214286 0.48214286] mean value: 0.538454433497537 key: train_roc_auc value: [0.67325428 0.61725955 0.6597962 0.62841593 0.63380037 0.65389069 0.59900417 0.61174155 0.65879265 0.65879265] mean value: 0.6394748047362482 key: test_jcc value: [0. 0.14285714 0. 0.25 0.25 0. 0.22222222 0.125 0. 0. ] mean value: 0.0990079365079365 key: train_jcc value: [0.34246575 0.23611111 0.31884058 0.25714286 0.26760563 0.30555556 0.2 0.22535211 0.31506849 0.31506849] mean value: 0.2783210589724569 MCC on Blind test: 0.3 Accuracy on Blind test: 0.81 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01502514 0.01375127 0.01382565 0.01428938 0.01496148 0.01451397 0.01644087 0.01448464 0.01387644 0.01542759] mean value: 0.014659643173217773 key: score_time value: [0.01057029 0.01050878 0.01076698 0.01031327 0.01009369 0.01014829 0.01023912 0.0101285 0.01013803 0.01021695] mean value: 0.010312390327453614 key: test_mcc value: [0.49365725 0. 0.16205093 0.31622777 0.47809144 0. 0.2362278 0.44883281 0. 0.49236596] mean value: 0.26274539603801333 key: train_mcc value: [0.59932645 0.65172653 0.65308612 0.65308612 0.63080736 0.67251176 0.65979056 0.65979056 0.63347284 0.6251464 ] mean value: 0.6438744675217546 key: test_accuracy value: [0.86111111 0.80555556 0.77777778 0.80555556 0.83333333 0.77777778 0.77777778 0.83333333 0.8 0.85714286] mean value: 0.812936507936508 key: train_accuracy value: [0.8757764 0.89130435 0.89130435 0.89130435 0.88509317 0.89751553 0.89440994 0.89440994 0.88544892 0.88235294] mean value: 0.8888919870007499 key: test_fscore value: [0.44444444 0. 0.2 0.22222222 0.57142857 0. 0.33333333 0.5 0. 0.44444444] mean value: 0.2715873015873016 key: train_fscore value: [0.6 0.68468468 0.65346535 0.65346535 0.62626263 0.68571429 0.67924528 0.67924528 0.6407767 0.62 ] mean value: 0.6522859554797765 key: test_precision value: [1. 0. 0.5 1. 0.66666667 0. 0.5 0.75 0. 1. ] mean value: 0.5416666666666666 key: train_precision value: [0.96774194 0.9047619 1. 1. 1. 0.97297297 0.94736842 0.94736842 0.97058824 1. ] mean value: 0.9710801890618129 key: test_recall value: [0.28571429 0. 0.125 0.125 0.5 0. 0.25 0.375 0. 0.28571429] mean value: 0.19464285714285715 key: train_recall value: [0.43478261 0.55072464 0.48529412 0.48529412 0.45588235 0.52941176 0.52941176 0.52941176 0.47826087 0.44927536] mean value: 0.49277493606138106 key: test_roc_auc value: [0.64285714 0.5 0.54464286 0.5625 0.71428571 0.5 0.58928571 0.66964286 0.5 0.64285714] mean value: 0.5866071428571429 key: train_roc_auc value: [0.71541502 0.76745718 0.74264706 0.74264706 0.72794118 0.76273738 0.76076887 0.76076887 0.73716193 0.72463768] mean value: 0.7442182233759956 key: test_jcc value: [0.28571429 0. 0.11111111 0.125 0.4 0. 0.2 0.33333333 0. 0.28571429] mean value: 0.1740873015873016 key: train_jcc value: [0.42857143 0.52054795 0.48529412 0.48529412 0.45588235 0.52173913 0.51428571 0.51428571 0.47142857 0.44927536] mean value: 0.4846604454765825 MCC on Blind test: 0.41 Accuracy on Blind test: 0.83 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.45310259 1.60784292 1.29070163 1.57894993 1.62366939 1.49906635 1.52056623 1.68461752 1.8801136 1.26551223] mean value: 1.5404142379760741 key: score_time value: [0.0127666 0.01392817 0.01249576 0.01368737 0.0171051 0.01363277 0.01388717 0.02197933 0.01406717 0.01296186] mean value: 0.014651131629943848 key: test_mcc value: [0.68887476 0.49365725 0.36493797 0.75134288 0.46291005 0.16205093 0.67857143 0.6172134 0.61237244 0.61237244] mean value: 0.5444303548709625 key: train_mcc value: [0.98155468 0.99086739 0.98135711 0.98135711 0.97192696 0.97192696 0.98135711 0.9906716 0.98157024 0.9722504 ] mean value: 0.9804839554759929 key: test_accuracy value: [0.88888889 0.86111111 0.80555556 0.91666667 0.80555556 0.77777778 0.88888889 0.86111111 0.88571429 0.88571429] mean value: 0.8576984126984127 key: train_accuracy value: [0.99378882 0.99689441 0.99378882 0.99378882 0.99068323 0.99068323 0.99378882 0.99689441 0.99380805 0.99071207] mean value: 0.993483068284522 key: test_fscore value: [0.75 0.44444444 0.46153846 0.76923077 0.58823529 0.2 0.75 0.70588235 0.6 0.66666667] mean value: 0.5935997988939166 key: train_fscore value: [0.98550725 0.99280576 0.98529412 0.98529412 0.97777778 0.97777778 0.98529412 0.99259259 0.98550725 0.97810219] mean value: 0.9845952939019653 key: test_precision value: [0.66666667 1. 0.6 1. 0.55555556 0.5 0.75 0.66666667 1. 0.8 ] mean value: 0.7538888888888888 key: train_precision value: [0.98550725 0.98571429 0.98529412 0.98529412 0.98507463 0.98507463 0.98529412 1. 0.98550725 0.98529412] mean value: 0.9868054502787488 key: test_recall value: [0.85714286 0.28571429 0.375 0.625 0.625 0.125 0.75 0.75 0.42857143 0.57142857] mean value: 0.5392857142857143 key: train_recall value: [0.98550725 1. 0.98529412 0.98529412 0.97058824 0.97058824 0.98529412 0.98529412 0.98550725 0.97101449] mean value: 0.9824381926683717 key: test_roc_auc value: [0.87684729 0.64285714 0.65178571 0.8125 0.74107143 0.54464286 0.83928571 0.82142857 0.71428571 0.76785714] mean value: 0.741256157635468 key: train_roc_auc value: [0.99077734 0.99802372 0.99067855 0.99067855 0.98332561 0.98332561 0.99067855 0.99264706 0.99078512 0.98353874] mean value: 0.9894458866612843 key: test_jcc value: [0.6 0.28571429 0.3 0.625 0.41666667 0.11111111 0.6 0.54545455 0.42857143 0.5 ] mean value: 0.44125180375180373 key: train_jcc value: [0.97142857 0.98571429 0.97101449 0.97101449 0.95652174 0.95652174 0.97101449 0.98529412 0.97142857 0.95714286] mean value: 0.9697095359883083 MCC on Blind test: 0.72 Accuracy on Blind test: 0.91 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02383637 0.02124 0.01789117 0.01587868 0.0169847 0.01740932 0.01692271 0.01825738 0.01615906 0.01933932] mean value: 0.018391871452331544 key: score_time value: [0.01232696 0.010185 0.00895429 0.00890899 0.00881505 0.00891113 0.00893497 0.00903082 0.00903559 0.00908685] mean value: 0.009418964385986328 key: test_mcc value: [0.8174367 0.75032247 1. 0.91914503 0.53300179 0.66143783 0.51785714 0.86189161 0.49391458 0.81649658] mean value: 0.737150373784473 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94444444 0.91666667 1. 0.97222222 0.77777778 0.88888889 0.83333333 0.94444444 0.85714286 0.94285714] mean value: 0.9077777777777778 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.83333333 0.8 1. 0.93333333 0.63636364 0.66666667 0.625 0.88888889 0.54545455 0.83333333] mean value: 0.7762373737373737 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.75 1. 1. 0.5 1. 0.625 0.8 0.75 1. ] mean value: 0.8425 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.71428571 0.85714286 1. 0.875 0.875 0.5 0.625 1. 0.42857143 0.71428571] mean value: 0.7589285714285714 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.85714286 0.89408867 1. 0.9375 0.8125 0.75 0.75892857 0.96428571 0.69642857 0.85714286] mean value: 0.8528017241379311 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.71428571 0.66666667 1. 0.875 0.46666667 0.5 0.45454545 0.8 0.375 0.71428571] mean value: 0.6566450216450217 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.84 Accuracy on Blind test: 0.94 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.10711122 0.10529757 0.10800672 0.10813236 0.10637426 0.10711551 0.106498 0.10879707 0.10712099 0.10767961] mean value: 0.10721333026885986 key: score_time value: [0.01776719 0.01798534 0.01802421 0.01788712 0.01799774 0.01796985 0.01869559 0.0179739 0.01853013 0.01776457] mean value: 0.018059563636779786 key: test_mcc value: [0.71962292 0.1872493 0.65737574 0.66143783 0.56354451 0.16205093 0.58149992 0.58149992 0.34299717 0.71842121] mean value: 0.5175699436874827 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91666667 0.80555556 0.88888889 0.88888889 0.83333333 0.77777778 0.86111111 0.86111111 0.82857143 0.91428571] mean value: 0.8576190476190476 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.72727273 0.22222222 0.71428571 0.66666667 0.66666667 0.2 0.66666667 0.66666667 0.25 0.72727273] mean value: 0.5507720057720058 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.5 0.83333333 1. 0.6 0.5 0.71428571 0.71428571 1. 1. ] mean value: 0.7861904761904762 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.57142857 0.14285714 0.625 0.5 0.75 0.125 0.625 0.625 0.14285714 0.57142857] mean value: 0.46785714285714286 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.78571429 0.55418719 0.79464286 0.75 0.80357143 0.54464286 0.77678571 0.77678571 0.57142857 0.78571429] mean value: 0.7143472906403942 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.57142857 0.125 0.55555556 0.5 0.5 0.11111111 0.5 0.5 0.14285714 0.57142857] mean value: 0.40773809523809523 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.68 Accuracy on Blind test: 0.9 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00979185 0.01068974 0.01069331 0.00985026 0.0098598 0.00974846 0.01084542 0.00980043 0.01022649 0.01085448] mean value: 0.010236024856567383 key: score_time value: [0.00915313 0.00982594 0.00975847 0.00891304 0.00955153 0.00960112 0.0089066 0.00889993 0.00894642 0.00965571] mean value: 0.009321188926696778 key: test_mcc value: [ 0.43895468 -0.03138824 0.19642857 0.2438548 0.29366622 0.75134288 0.26519742 0.07503225 0.49391458 0.15161961] mean value: 0.28786227687707744 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.72222222 0.69444444 0.72222222 0.75 0.69444444 0.91666667 0.72222222 0.69444444 0.85714286 0.74285714] mean value: 0.7516666666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.54545455 0.15384615 0.375 0.4 0.47619048 0.76923077 0.44444444 0.26666667 0.54545455 0.30769231] mean value: 0.4283979908979909 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.4 0.16666667 0.375 0.42857143 0.38461538 1. 0.4 0.28571429 0.75 0.33333333] mean value: 0.4523901098901099 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.85714286 0.14285714 0.375 0.375 0.625 0.625 0.5 0.25 0.42857143 0.28571429] mean value: 0.4464285714285714 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.77339901 0.48522167 0.59821429 0.61607143 0.66964286 0.8125 0.64285714 0.53571429 0.69642857 0.57142857] mean value: 0.6401477832512316 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.375 0.08333333 0.23076923 0.25 0.3125 0.625 0.28571429 0.15384615 0.375 0.18181818] mean value: 0.28729811854811854 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.12 Accuracy on Blind test: 0.72 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.50880575 1.47889471 1.49357581 1.51851034 1.48310709 1.48366189 1.48690367 1.52164292 1.48334312 1.50404835] mean value: 1.4962493658065796 key: score_time value: [0.09659505 0.09937906 0.0998745 0.09763646 0.09497857 0.09289384 0.09455442 0.09473515 0.09559774 0.09992361] mean value: 0.09661684036254883 key: test_mcc value: [1. 0.49365725 0.83666003 0.66143783 0.67857143 0.75134288 0.77151675 0.77151675 0.61237244 0.90971765] mean value: 0.7486793002774902 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.86111111 0.94444444 0.88888889 0.88888889 0.91666667 0.91666667 0.91666667 0.88571429 0.97142857] mean value: 0.919047619047619 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.44444444 0.85714286 0.66666667 0.75 0.76923077 0.82352941 0.82352941 0.6 0.92307692] mean value: 0.7657620484091072 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 1. 1. 0.75 1. 0.77777778 0.77777778 1. 1. ] mean value: 0.9305555555555556 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.28571429 0.75 0.5 0.75 0.625 0.875 0.875 0.42857143 0.85714286] mean value: 0.6946428571428571 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.64285714 0.875 0.75 0.83928571 0.8125 0.90178571 0.90178571 0.71428571 0.92857143] mean value: 0.8366071428571429 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.28571429 0.75 0.5 0.6 0.625 0.7 0.7 0.42857143 0.85714286] mean value: 0.6446428571428571 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.83 Accuracy on Blind test: 0.94 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: fit_time value: [1.79229784 0.95148969 0.91167283 0.94181609 0.88025999 0.92065978 0.91286707 0.9315846 0.88310695 0.94630051] mean value: 1.0072055339813233 key: score_time value: [0.20599699 0.3146832 0.24151015 0.27133441 0.13036633 0.22741055 0.23978066 0.25486302 0.21415901 0.17490315] mean value: 0.2275007486343384 key: test_mcc value: [0.71962292 0.34404556 0.66143783 0.66143783 0.47809144 0.45374261 0.75032247 0.47809144 0.49236596 0.61237244] mean value: 0.565153049910657 key: train_mcc value: [0.88709235 0.92532149 0.9244842 0.93401658 0.9244842 0.9244842 0.90518666 0.90534273 0.92534731 0.90647794] mean value: 0.9162237656285032 key: test_accuracy value: [0.91666667 0.83333333 0.88888889 0.88888889 0.83333333 0.83333333 0.91666667 0.83333333 0.85714286 0.88571429] mean value: 0.8687301587301587 key: train_accuracy value: [0.96273292 0.97515528 0.97515528 0.97826087 0.97515528 0.97515528 0.9689441 0.9689441 0.9752322 0.96904025] mean value: 0.9723775551410495 key: test_fscore value: [0.72727273 0.25 0.66666667 0.66666667 0.57142857 0.4 0.8 0.57142857 0.44444444 0.6 ] mean value: 0.5697907647907648 key: train_fscore value: [0.90769231 0.93939394 0.93846154 0.94656489 0.93846154 0.93846154 0.92307692 0.921875 0.94029851 0.92307692] mean value: 0.9317363101583578 key: test_precision value: [1. 1. 1. 1. 0.66666667 1. 0.85714286 0.66666667 1. 1. ] mean value: 0.919047619047619 key: train_precision value: [0.96721311 0.98412698 0.98387097 0.98412698 0.98387097 0.98387097 0.96774194 0.98333333 0.96923077 0.98360656] mean value: 0.9790992581658896 key: test_recall value: [0.57142857 0.14285714 0.5 0.5 0.5 0.25 0.75 0.5 0.28571429 0.42857143] mean value: 0.44285714285714284 key: train_recall value: [0.85507246 0.89855072 0.89705882 0.91176471 0.89705882 0.89705882 0.88235294 0.86764706 0.91304348 0.86956522] mean value: 0.8889173060528559 key: test_roc_auc value: [0.78571429 0.57142857 0.75 0.75 0.71428571 0.625 0.85714286 0.71428571 0.64285714 0.71428571] mean value: 0.7125 key: train_roc_auc value: [0.92358366 0.94729908 0.94656091 0.95391385 0.94656091 0.94656091 0.93723946 0.93185503 0.95258473 0.9328141 ] mean value: 0.941897263713926 key: test_jcc value: [0.57142857 0.14285714 0.5 0.5 0.4 0.25 0.66666667 0.4 0.28571429 0.42857143] mean value: 0.4145238095238095 key: train_jcc value: [0.83098592 0.88571429 0.88405797 0.89855072 0.88405797 0.88405797 0.85714286 0.85507246 0.88732394 0.85714286] mean value: 0.8724106960604205 MCC on Blind test: 0.76 Accuracy on Blind test: 0.92 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02311754 0.00943446 0.00950122 0.00938201 0.00950313 0.00959158 0.00952291 0.00950456 0.00949812 0.00953507] mean value: 0.010859060287475585 key: score_time value: [0.01207566 0.0087018 0.00878215 0.00869799 0.00875616 0.00878167 0.0086906 0.00866127 0.00872397 0.00867581] mean value: 0.00905470848083496 key: test_mcc value: [ 0.75032247 0.2085873 0.16205093 0.47809144 0.58149992 -0.18898224 0.35714286 0.0805823 0.49391458 0.10206207] mean value: 0.30252716321584283 key: train_mcc value: [0.39769343 0.45897008 0.43461577 0.4144431 0.49728141 0.50633817 0.48412839 0.39761525 0.41442016 0.42938548] mean value: 0.4434891238000965 key: test_accuracy value: [0.91666667 0.77777778 0.77777778 0.83333333 0.86111111 0.66666667 0.77777778 0.75 0.85714286 0.77142857] mean value: 0.798968253968254 key: train_accuracy value: [0.81677019 0.83850932 0.82608696 0.82608696 0.84782609 0.85093168 0.84161491 0.81987578 0.82352941 0.82352941] mean value: 0.8314760686883449 key: test_fscore value: [0.8 0.33333333 0.2 0.57142857 0.66666667 0. 0.5 0.18181818 0.54545455 0.2 ] mean value: 0.3998701298701299 key: train_fscore value: [0.4957265 0.52727273 0.53333333 0.5 0.57391304 0.57894737 0.57142857 0.49122807 0.50434783 0.52892562] mean value: 0.5305123055757547 key: test_precision value: [0.75 0.4 0.5 0.66666667 0.71428571 0. 0.5 0.33333333 0.75 0.33333333] mean value: 0.49476190476190474 key: train_precision value: [0.60416667 0.70731707 0.61538462 0.63636364 0.70212766 0.7173913 0.66666667 0.60869565 0.63043478 0.61538462] mean value: 0.6503932672341834 key: test_recall value: [0.85714286 0.28571429 0.125 0.5 0.625 0. 0.5 0.125 0.42857143 0.14285714] mean value: 0.35892857142857143 key: train_recall value: [0.42028986 0.42028986 0.47058824 0.41176471 0.48529412 0.48529412 0.5 0.41176471 0.42028986 0.46376812] mean value: 0.44893435635123613 key: test_roc_auc value: [0.89408867 0.591133 0.54464286 0.71428571 0.77678571 0.42857143 0.67857143 0.52678571 0.69642857 0.53571429] mean value: 0.6387007389162562 key: train_roc_auc value: [0.67259552 0.68642951 0.69592404 0.67438629 0.715088 0.71705651 0.71653543 0.67044928 0.67668036 0.69251398] mean value: 0.6917658928125731 key: test_jcc value: [0.66666667 0.2 0.11111111 0.4 0.5 0. 0.33333333 0.1 0.375 0.11111111] mean value: 0.2797222222222222 key: train_jcc value: [0.32954545 0.35802469 0.36363636 0.33333333 0.40243902 0.40740741 0.4 0.3255814 0.3372093 0.35955056] mean value: 0.3616727534142999 MCC on Blind test: 0.38 Accuracy on Blind test: 0.81 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.09641838 0.05527234 0.0620811 0.07439089 0.05781317 0.06530595 0.06510544 0.08454537 0.20302868 0.05387855] mean value: 0.08178398609161378 key: score_time value: [0.0110662 0.01033831 0.01064897 0.01097441 0.01094913 0.01161528 0.01144314 0.01093102 0.01129246 0.01139951] mean value: 0.011065840721130371 key: test_mcc value: [1. 0.91914503 1. 0.91914503 0.80582296 0.91914503 0.9258201 0.86189161 0.72019314 0.90971765] mean value: 0.8980880554918083 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.97222222 1. 0.97222222 0.91666667 0.97222222 0.97222222 0.94444444 0.91428571 0.97142857] mean value: 0.9635714285714285 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.93333333 1. 0.93333333 0.84210526 0.93333333 0.94117647 0.88888889 0.76923077 0.92307692] mean value: 0.9164478314942711 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.875 1. 1. 0.72727273 1. 0.88888889 0.8 0.83333333 1. ] mean value: 0.912449494949495 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 0.875 1. 0.875 1. 1. 0.71428571 0.85714286] mean value: 0.9321428571428572 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.98275862 1. 0.9375 0.94642857 0.9375 0.98214286 0.96428571 0.83928571 0.92857143] mean value: 0.9518472906403941 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.875 1. 0.875 0.72727273 0.875 0.88888889 0.8 0.625 0.85714286] mean value: 0.8523304473304474 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.91 Accuracy on Blind test: 0.97 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04570127 0.09128404 0.06899452 0.07317495 0.06892586 0.06894684 0.07744288 0.03692055 0.05794311 0.05879235] mean value: 0.06481263637542725 key: score_time value: [0.01260543 0.01506281 0.02454662 0.02456522 0.02190804 0.01689649 0.01267576 0.01267529 0.01691985 0.01226139] mean value: 0.017011690139770507 key: test_mcc value: [0.72192954 0.6453202 0.55814043 0.5157267 0.66077483 0.67857143 0.58149992 0.6172134 0.81649658 0.72019314] mean value: 0.6515866163842986 key: train_mcc value: [0.9358192 0.9358192 0.93513953 0.93513953 0.93513953 0.93513953 0.93513953 0.95318232 0.93587381 0.92628095] mean value: 0.9362673146575943 key: test_accuracy value: [0.91666667 0.88888889 0.86111111 0.80555556 0.86111111 0.88888889 0.86111111 0.86111111 0.94285714 0.91428571] mean value: 0.8801587301587301 key: train_accuracy value: [0.97826087 0.97826087 0.97826087 0.97826087 0.97826087 0.97826087 0.97826087 0.98447205 0.97832817 0.9752322 ] mean value: 0.9785858508162991 key: test_fscore value: [0.76923077 0.71428571 0.61538462 0.63157895 0.73684211 0.75 0.66666667 0.70588235 0.83333333 0.76923077] mean value: 0.7192435273704624 key: train_fscore value: [0.94964029 0.94964029 0.94890511 0.94890511 0.94890511 0.94890511 0.94890511 0.96296296 0.94964029 0.94202899] mean value: 0.9498438359224818 key: test_precision value: [0.83333333 0.71428571 0.8 0.54545455 0.63636364 0.75 0.71428571 0.66666667 1. 0.83333333] mean value: 0.7493722943722944 key: train_precision value: [0.94285714 0.94285714 0.94202899 0.94202899 0.94202899 0.94202899 0.94202899 0.97014925 0.94285714 0.94202899] mean value: 0.945089459534625 key: test_recall value: [0.71428571 0.71428571 0.5 0.75 0.875 0.75 0.625 0.75 0.71428571 0.71428571] mean value: 0.7107142857142857 key: train_recall value: [0.95652174 0.95652174 0.95588235 0.95588235 0.95588235 0.95588235 0.95588235 0.95588235 0.95652174 0.94202899] mean value: 0.954688832054561 key: test_roc_auc value: [0.83990148 0.8226601 0.73214286 0.78571429 0.86607143 0.83928571 0.77678571 0.82142857 0.85714286 0.83928571] mean value: 0.8180418719211823 key: train_roc_auc value: [0.97035573 0.97035573 0.97006716 0.97006716 0.97006716 0.97006716 0.97006716 0.97400417 0.97038685 0.96314048] mean value: 0.9698578765482728 key: test_jcc value: [0.625 0.55555556 0.44444444 0.46153846 0.58333333 0.6 0.5 0.54545455 0.71428571 0.625 ] mean value: 0.5654612054612055 key: train_jcc value: [0.90410959 0.90410959 0.90277778 0.90277778 0.90277778 0.90277778 0.90277778 0.92857143 0.90410959 0.89041096] mean value: 0.9045200043487714 MCC on Blind test: 0.68 Accuracy on Blind test: 0.89 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01387548 0.01329589 0.00991011 0.01070857 0.00993299 0.00958133 0.01013231 0.00957942 0.00979066 0.00977588] mean value: 0.01065826416015625 key: score_time value: [0.01285505 0.01248264 0.00921583 0.00976205 0.0090754 0.00870442 0.00901771 0.0088408 0.00888538 0.00878716] mean value: 0.00976264476776123 key: test_mcc value: [0.49629167 0.2085873 0.66143783 0.55814043 0.35714286 0.44883281 0.46291005 0.17173552 0.61237244 0.64285714] mean value: 0.4620308033163919 key: train_mcc value: [0.53438367 0.60265353 0.54665085 0.50633817 0.57652074 0.53118814 0.55255244 0.57652074 0.53762725 0.53762725] mean value: 0.5502062784000151 key: test_accuracy value: [0.86111111 0.77777778 0.88888889 0.86111111 0.77777778 0.83333333 0.80555556 0.75 0.88571429 0.88571429] mean value: 0.8326984126984127 key: train_accuracy value: [0.85714286 0.8757764 0.86024845 0.85093168 0.86956522 0.85714286 0.86335404 0.86956522 0.85758514 0.85758514] mean value: 0.8618896986712306 key: test_fscore value: [0.54545455 0.33333333 0.66666667 0.61538462 0.5 0.5 0.58823529 0.30769231 0.6 0.71428571] mean value: 0.537105247693483 key: train_fscore value: [0.60344828 0.66666667 0.62184874 0.57894737 0.6440678 0.60344828 0.62068966 0.6440678 0.61016949 0.61016949] mean value: 0.6203523557751256 key: test_precision value: [0.75 0.4 1. 0.8 0.5 0.75 0.55555556 0.4 1. 0.71428571] mean value: 0.686984126984127 key: train_precision value: [0.74468085 0.78431373 0.7254902 0.7173913 0.76 0.72916667 0.75 0.76 0.73469388 0.73469388] mean value: 0.7440430498748991 key: test_recall value: [0.42857143 0.28571429 0.5 0.5 0.5 0.375 0.625 0.25 0.42857143 0.71428571] mean value: 0.4607142857142857 key: train_recall value: [0.50724638 0.57971014 0.54411765 0.48529412 0.55882353 0.51470588 0.52941176 0.55882353 0.52173913 0.52173913] mean value: 0.5321611253196931 key: test_roc_auc value: [0.69704433 0.591133 0.75 0.73214286 0.67857143 0.66964286 0.74107143 0.57142857 0.71428571 0.82142857] mean value: 0.6966748768472907 key: train_roc_auc value: [0.72990777 0.76811594 0.74449977 0.71705651 0.75578972 0.73176239 0.74108384 0.75578972 0.73527901 0.73527901] mean value: 0.7414563679569117 key: test_jcc value: [0.375 0.2 0.5 0.44444444 0.33333333 0.33333333 0.41666667 0.18181818 0.42857143 0.55555556] mean value: 0.37687229437229436 key: train_jcc value: [0.43209877 0.5 0.45121951 0.40740741 0.475 0.43209877 0.45 0.475 0.43902439 0.43902439] mean value: 0.4500873230954532 MCC on Blind test: 0.56 Accuracy on Blind test: 0.87 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01311135 0.01836658 0.01801801 0.02214932 0.0201776 0.02150178 0.03540277 0.02377319 0.01945472 0.02329373] mean value: 0.021524906158447266 key: score_time value: [0.00971651 0.01126003 0.01196456 0.01231813 0.01482916 0.0148766 0.01425123 0.01228762 0.01243329 0.01276159] mean value: 0.012669873237609864 key: test_mcc value: [0.85096294 0.72192954 0.44883281 0.91914503 0.41267736 0.31622777 0.37067856 0.6172134 0.61237244 0.72019314] mean value: 0.5990232995602679 key: train_mcc value: [0.84986344 0.90206627 0.77744561 0.94588078 0.86625969 0.8365424 0.93513953 0.94378174 0.81007791 0.92534731] mean value: 0.8792404687832709 key: test_accuracy value: [0.94444444 0.91666667 0.83333333 0.97222222 0.80555556 0.80555556 0.75 0.86111111 0.88571429 0.91428571] mean value: 0.8688888888888889 key: train_accuracy value: [0.95031056 0.96583851 0.92857143 0.98136646 0.95652174 0.94720497 0.97826087 0.98136646 0.9380805 0.9752322 ] mean value: 0.9602753687287272 key: test_fscore value: [0.875 0.76923077 0.5 0.93333333 0.53333333 0.22222222 0.52631579 0.70588235 0.6 0.76923077] mean value: 0.6434548569765288 key: train_fscore value: [0.88059701 0.92307692 0.8 0.95714286 0.890625 0.864 0.94890511 0.95384615 0.83333333 0.94029851] mean value: 0.8991824899276378 key: test_precision value: [0.77777778 0.83333333 0.75 1. 0.57142857 1. 0.45454545 0.66666667 1. 0.83333333] mean value: 0.7887085137085137 key: train_precision value: [0.90769231 0.89189189 0.9787234 0.93055556 0.95 0.94736842 0.94202899 1. 0.98039216 0.96923077] mean value: 0.9497883492048467 key: test_recall value: [1. 0.71428571 0.375 0.875 0.5 0.125 0.625 0.75 0.42857143 0.71428571] mean value: 0.6107142857142858 key: train_recall value: [0.85507246 0.95652174 0.67647059 0.98529412 0.83823529 0.79411765 0.95588235 0.91176471 0.72463768 0.91304348] mean value: 0.8611040068201193 key: test_roc_auc value: [0.96551724 0.83990148 0.66964286 0.9375 0.69642857 0.5625 0.70535714 0.82142857 0.71428571 0.83928571] mean value: 0.7751847290640395 key: train_roc_auc value: [0.91567852 0.96245059 0.83626679 0.98280454 0.91321214 0.89115331 0.97006716 0.95588235 0.86035034 0.95258473] mean value: 0.9240450475107724 key: test_jcc value: [0.77777778 0.625 0.33333333 0.875 0.36363636 0.125 0.35714286 0.54545455 0.42857143 0.625 ] mean value: 0.5055916305916306 key: train_jcc value: [0.78666667 0.85714286 0.66666667 0.91780822 0.8028169 0.76056338 0.90277778 0.91176471 0.71428571 0.88732394] mean value: 0.820781683295223 MCC on Blind test: 0.72 Accuracy on Blind test: 0.91 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01766181 0.01630974 0.0205574 0.01550651 0.01691341 0.01607966 0.01694655 0.01806903 0.01713586 0.01994872] mean value: 0.017512869834899903 key: score_time value: [0.0132618 0.01240635 0.01228547 0.0124495 0.01213622 0.01185417 0.01222014 0.01217818 0.01196337 0.01226044] mean value: 0.01230156421661377 key: test_mcc value: [0.58131836 0.34404556 0.67005939 0.75134288 0.51785714 0.5976143 0.50560765 0.51785714 0.35478744 0.61237244] mean value: 0.5452862312081964 key: train_mcc value: [0.77957604 0.49277338 0.66452587 0.84671817 0.85667348 0.43962631 0.72241165 0.87641313 0.69158946 0.89676152] mean value: 0.7267069018892041 key: test_accuracy value: [0.77777778 0.83333333 0.83333333 0.91666667 0.83333333 0.77777778 0.69444444 0.83333333 0.71428571 0.88571429] mean value: 0.81 key: train_accuracy value: [0.91614907 0.84782609 0.83850932 0.95031056 0.95341615 0.63043478 0.8757764 0.95962733 0.85139319 0.96594427] mean value: 0.8789387150741304 key: test_fscore value: [0.63636364 0.25 0.72727273 0.76923077 0.625 0.66666667 0.59259259 0.625 0.5 0.66666667] mean value: 0.6058793058793058 key: train_fscore value: [0.82580645 0.4494382 0.72043011 0.875 0.88372093 0.53333333 0.77011494 0.896 0.74193548 0.91603053] mean value: 0.7611809985703716 key: test_precision value: [0.46666667 1. 0.57142857 1. 0.625 0.5 0.42105263 0.625 0.38461538 0.8 ] mean value: 0.6393763254289571 key: train_precision value: [0.74418605 1. 0.56779661 0.93333333 0.93442623 0.36363636 0.63207547 0.98245614 0.58974359 0.96774194] mean value: 0.7715395720435464 key: test_recall value: [1. 0.14285714 1. 0.625 0.625 1. 1. 0.625 0.71428571 0.57142857] mean value: 0.7303571428571428 key: train_recall value: [0.92753623 0.28985507 0.98529412 0.82352941 0.83823529 1. 0.98529412 0.82352941 1. 0.86956522] mean value: 0.8542838874680307 key: test_roc_auc value: [0.86206897 0.57142857 0.89285714 0.8125 0.75892857 0.85714286 0.80357143 0.75892857 0.71428571 0.76785714] mean value: 0.7799568965517242 key: train_roc_auc value: [0.92028986 0.64492754 0.89225336 0.90389069 0.91124363 0.76574803 0.91587541 0.9097962 0.90551181 0.9308456 ] mean value: 0.8700382121352478 key: test_jcc value: [0.46666667 0.14285714 0.57142857 0.625 0.45454545 0.5 0.42105263 0.45454545 0.33333333 0.5 ] mean value: 0.4469429254955571 key: train_jcc value: [0.7032967 0.28985507 0.56302521 0.77777778 0.79166667 0.36363636 0.62616822 0.8115942 0.58974359 0.84507042] mean value: 0.6361834233401731 MCC on Blind test: 0.53 Accuracy on Blind test: 0.79 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.15819955 0.14830995 0.14526677 0.13937092 0.14099097 0.13902879 0.1481483 0.1477356 0.14789319 0.14681125] mean value: 0.14617552757263183 key: score_time value: [0.01697183 0.01665115 0.01544809 0.01664472 0.01590967 0.0155158 0.01666117 0.0167737 0.01641393 0.01669216] mean value: 0.0163682222366333 key: test_mcc value: [1. 0.91914503 0.91914503 0.91914503 0.80582296 0.75134288 0.9258201 0.86189161 0.49391458 0.81649658] mean value: 0.8412723806521708 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.97222222 0.97222222 0.97222222 0.91666667 0.91666667 0.97222222 0.94444444 0.85714286 0.94285714] mean value: 0.9466666666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.93333333 0.93333333 0.93333333 0.84210526 0.76923077 0.94117647 0.88888889 0.54545455 0.83333333] mean value: 0.8620189270653666 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.875 1. 1. 0.72727273 1. 0.88888889 0.8 0.75 1. ] mean value: 0.9041161616161616 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 0.875 0.875 1. 0.625 1. 1. 0.42857143 0.71428571] mean value: 0.8517857142857143 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.98275862 0.9375 0.9375 0.94642857 0.8125 0.98214286 0.96428571 0.69642857 0.85714286] mean value: 0.9116687192118227 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.875 0.875 0.875 0.72727273 0.625 0.88888889 0.8 0.375 0.71428571] mean value: 0.775544733044733 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.94 Accuracy on Blind test: 0.98 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.05724621 0.0500145 0.0438199 0.07104373 0.0596118 0.06134391 0.06512403 0.070961 0.0410428 0.05038881] mean value: 0.05705966949462891 key: score_time value: [0.02019477 0.02353168 0.0291779 0.04015303 0.0278511 0.03402472 0.0242095 0.04008889 0.02148032 0.02989411] mean value: 0.02906060218811035 key: test_mcc value: [1. 0.91914503 1. 0.91914503 0.80582296 0.83666003 0.77151675 0.86189161 0.49391458 0.81649658] mean value: 0.8424592569278723 key: train_mcc value: [0.98155468 0.98155468 0.97192696 0.99077106 0.99077106 0.9906716 0.97196923 0.97192696 0.96368577 1. ] mean value: 0.981483200152367 key: test_accuracy value: [1. 0.97222222 1. 0.97222222 0.91666667 0.94444444 0.91666667 0.94444444 0.85714286 0.94285714] mean value: 0.9466666666666667 key: train_accuracy value: [0.99378882 0.99378882 0.99068323 0.99689441 0.99689441 0.99689441 0.99068323 0.99068323 0.9876161 1. ] mean value: 0.9937926658077418 key: test_fscore value: [1. 0.93333333 1. 0.93333333 0.84210526 0.85714286 0.82352941 0.88888889 0.54545455 0.83333333] mean value: 0.8657120966408892 key: train_fscore value: [0.98550725 0.98550725 0.97777778 0.99270073 0.99270073 0.99259259 0.97744361 0.97777778 0.97142857 1. ] mean value: 0.9853436281206914 key: test_precision value: [1. 0.875 1. 1. 0.72727273 1. 0.77777778 0.8 0.75 1. ] mean value: 0.8930050505050505 key: train_precision value: [0.98550725 0.98550725 0.98507463 0.98550725 0.98550725 1. 1. 0.98507463 0.95774648 1. ] mean value: 0.9869924718111829 key: test_recall value: [1. 1. 1. 0.875 1. 0.75 0.875 1. 0.42857143 0.71428571] mean value: 0.8642857142857143 key: train_recall value: [0.98550725 0.98550725 0.97058824 1. 1. 0.98529412 0.95588235 0.97058824 0.98550725 1. ] mean value: 0.9838874680306906 key: test_roc_auc value: [1. 0.98275862 1. 0.9375 0.94642857 0.875 0.90178571 0.96428571 0.69642857 0.85714286] mean value: 0.9161330049261084 key: train_roc_auc value: [0.99077734 0.99077734 0.98332561 0.9980315 0.9980315 0.99264706 0.97794118 0.98332561 0.98684811 1. ] mean value: 0.9901705243424437 key: test_jcc value: [1. 0.875 1. 0.875 0.72727273 0.75 0.7 0.8 0.375 0.71428571] mean value: 0.7816558441558441 key: train_jcc value: [0.97142857 0.97142857 0.95652174 0.98550725 0.98550725 0.98529412 0.95588235 0.95652174 0.94444444 1. ] mean value: 0.9712536028904315 MCC on Blind test: 0.87 Accuracy on Blind test: 0.96 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.08166194 0.14968729 0.11179137 0.07614708 0.08787417 0.1285069 0.12228823 0.12631869 0.09615564 0.09222722] mean value: 0.10726585388183593 key: score_time value: [0.02172256 0.02620792 0.02364445 0.01462293 0.022928 0.02531552 0.02583742 0.02750945 0.02175593 0.02795911] mean value: 0.02375032901763916 key: test_mcc value: [ 0.1872493 0.1872493 -0.09035079 0. -0.12964074 0. 0.2362278 0.45374261 -0.08574929 0. ] mean value: 0.07587281768515772 key: train_mcc value: [0.91634855 0.93507164 0.93434457 0.90588785 0.89634849 0.91539921 0.93434457 0.90588785 0.90701894 0.91641052] mean value: 0.9167062185712945 key: test_accuracy value: [0.80555556 0.80555556 0.75 0.77777778 0.72222222 0.77777778 0.77777778 0.83333333 0.77142857 0.8 ] mean value: 0.7821428571428571 key: train_accuracy value: [0.97204969 0.97826087 0.97826087 0.9689441 0.96583851 0.97204969 0.97826087 0.9689441 0.96904025 0.97213622] mean value: 0.972378516624041 key: test_fscore value: [0.22222222 0.22222222 0. 0. 0. 0. 0.33333333 0.4 0. 0. ] mean value: 0.11777777777777779 key: train_fscore value: [0.93023256 0.94656489 0.94573643 0.92063492 0.912 0.92913386 0.94573643 0.92063492 0.921875 0.93023256] mean value: 0.9302781569529865 key: test_precision value: [0.5 0.5 0. 0. 0. 0. 0.5 1. 0. 0. ] mean value: 0.25 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.14285714 0.14285714 0. 0. 0. 0. 0.25 0.25 0. 0. ] mean value: 0.07857142857142857 key: train_recall value: [0.86956522 0.89855072 0.89705882 0.85294118 0.83823529 0.86764706 0.89705882 0.85294118 0.85507246 0.86956522] mean value: 0.8698635976129583 key: test_roc_auc value: [0.55418719 0.55418719 0.48214286 0.5 0.46428571 0.5 0.58928571 0.625 0.48214286 0.5 ] mean value: 0.5251231527093596 key: train_roc_auc value: [0.93478261 0.94927536 0.94852941 0.92647059 0.91911765 0.93382353 0.94852941 0.92647059 0.92753623 0.93478261] mean value: 0.9349317988064791 key: test_jcc value: [0.125 0.125 0. 0. 0. 0. 0.2 0.25 0. 0. ] mean value: 0.07 key: train_jcc value: [0.86956522 0.89855072 0.89705882 0.85294118 0.83823529 0.86764706 0.89705882 0.85294118 0.85507246 0.86956522] mean value: 0.8698635976129583 MCC on Blind test: 0.11 Accuracy on Blind test: 0.78 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.52069354 0.50328159 0.50455189 0.50025082 0.50435662 0.50333905 0.49607444 0.50226235 0.50641394 0.49668527] mean value: 0.5037909507751465 key: score_time value: [0.00986624 0.00982571 0.01003551 0.00934696 0.00967073 0.00945807 0.0095005 0.00956893 0.01015067 0.01010966] mean value: 0.00975329875946045 key: test_mcc value: [1. 0.8226601 1. 0.91914503 0.71098137 0.75134288 0.86189161 0.86189161 0.61237244 1. ] mean value: 0.8540285030671064 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.94444444 1. 0.97222222 0.86111111 0.91666667 0.94444444 0.94444444 0.88571429 1. ] mean value: 0.9469047619047619 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.85714286 1. 0.93333333 0.76190476 0.76923077 0.88888889 0.88888889 0.66666667 1. ] mean value: 0.8766056166056166 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.85714286 1. 1. 0.61538462 1. 0.8 0.8 0.8 1. ] mean value: 0.8872527472527473 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.85714286 1. 0.875 1. 0.625 1. 1. 0.57142857 1. ] mean value: 0.8928571428571428 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.91133005 1. 0.9375 0.91071429 0.8125 0.96428571 0.96428571 0.76785714 1. ] mean value: 0.9268472906403941 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.75 1. 0.875 0.61538462 0.625 0.8 0.8 0.5 1. ] mean value: 0.7965384615384615 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.91 Accuracy on Blind test: 0.97 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.02580166 0.02822471 0.02490973 0.04556131 0.0245204 0.02402449 0.02442002 0.02469778 0.02477241 0.02475023] mean value: 0.02716827392578125 key: score_time value: [0.01345968 0.01275158 0.01257443 0.01375508 0.01480269 0.01374626 0.01462483 0.01372385 0.01461887 0.0150516 ] mean value: 0.013910889625549316 key: test_mcc value: [-0.08304548 -0.11915865 -0.16116459 -0.23904572 -0.12964074 -0.3086067 0.32232919 -0.12964074 -0.15309311 0.15161961] mean value: -0.08494469446571944 key: train_mcc value: [0.24048671 0.24048671 0.28810855 0.21675985 0.28810855 0.2663143 0.18742507 0.24272682 0.24058235 0.15144495] mean value: 0.23624438625695043 key: test_accuracy value: [0.77777778 0.75 0.69444444 0.61111111 0.72222222 0.52777778 0.80555556 0.72222222 0.71428571 0.74285714] mean value: 0.7068253968253968 key: train_accuracy value: [0.80124224 0.80124224 0.81055901 0.80124224 0.81055901 0.80745342 0.79813665 0.80434783 0.80185759 0.79256966] mean value: 0.8029209853277696 key: test_fscore value: [0. 0. 0. 0. 0. 0. 0.36363636 0. 0. 0.30769231] mean value: 0.06713286713286713 key: train_fscore value: [0.13513514 0.13513514 0.18666667 0.11111111 0.18666667 0.16216216 0.08450704 0.1369863 0.13513514 0.05633803] mean value: 0.13298433838044102 key: test_precision value: [0. 0. 0. 0. 0. 0. 0.66666667 0. 0. 0.33333333] mean value: 0.09999999999999999 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0. 0. 0. 0. 0. 0. 0.25 0. 0. 0.28571429] mean value: 0.05357142857142857 key: train_recall value: [0.07246377 0.07246377 0.10294118 0.05882353 0.10294118 0.08823529 0.04411765 0.07352941 0.07246377 0.02898551] mean value: 0.07169650468883206 key: test_roc_auc value: [0.48275862 0.46551724 0.44642857 0.39285714 0.46428571 0.33928571 0.60714286 0.46428571 0.44642857 0.57142857] mean value: 0.4680418719211823 key: train_roc_auc value: [0.53623188 0.53623188 0.55147059 0.52941176 0.55147059 0.54411765 0.52205882 0.53676471 0.53623188 0.51449275] mean value: 0.535848252344416 key: test_jcc value: [0. 0. 0. 0. 0. 0. 0.22222222 0. 0. 0.18181818] mean value: 0.0404040404040404 key: train_jcc value: [0.07246377 0.07246377 0.10294118 0.05882353 0.10294118 0.08823529 0.04411765 0.07352941 0.07246377 0.02898551] mean value: 0.07169650468883206 MCC on Blind test: 0.02 Accuracy on Blind test: 0.77 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02734208 0.03658342 0.03627729 0.03802681 0.03607988 0.03623199 0.07836962 0.0335741 0.04661655 0.03537488] mean value: 0.040447664260864255 key: score_time value: [0.02119327 0.02037835 0.02254105 0.02403259 0.02666831 0.02385139 0.02655315 0.0224328 0.02348709 0.02180433] mean value: 0.023294234275817872 key: test_mcc value: [1. 0.6144869 0.55814043 0.75134288 0.46291005 0.66143783 0.67857143 0.77151675 0.71842121 0.7484552 ] mean value: 0.6965282676681155 key: train_mcc value: [0.89727565 0.88757529 0.87718604 0.86725712 0.88588911 0.88708251 0.90539133 0.91509932 0.88839586 0.89735962] mean value: 0.89085118703998 key: test_accuracy value: [1. 0.88888889 0.86111111 0.91666667 0.80555556 0.88888889 0.88888889 0.91666667 0.91428571 0.91428571] mean value: 0.8995238095238095 key: train_accuracy value: [0.96583851 0.96273292 0.95962733 0.95652174 0.96273292 0.96273292 0.9689441 0.97204969 0.9628483 0.96594427] mean value: 0.9639972693883045 key: test_fscore value: [1. 0.66666667 0.61538462 0.76923077 0.58823529 0.66666667 0.75 0.82352941 0.72727273 0.8 ] mean value: 0.7406986151103798 key: train_fscore value: [0.91851852 0.91044776 0.90225564 0.89393939 0.90769231 0.91044776 0.92424242 0.93233083 0.91176471 0.91851852] mean value: 0.9130157857346989 key: test_precision value: [1. 0.8 0.8 1. 0.55555556 1. 0.75 0.77777778 1. 0.75 ] mean value: 0.8433333333333334 key: train_precision value: [0.93939394 0.93846154 0.92307692 0.921875 0.9516129 0.92424242 0.953125 0.95384615 0.92537313 0.93939394] mean value: 0.9370400955969084 key: test_recall value: [1. 0.57142857 0.5 0.625 0.625 0.5 0.75 0.875 0.57142857 0.85714286] mean value: 0.6875 key: train_recall value: [0.89855072 0.88405797 0.88235294 0.86764706 0.86764706 0.89705882 0.89705882 0.91176471 0.89855072 0.89855072] mean value: 0.8903239556692242 key: test_roc_auc value: [1. 0.76847291 0.73214286 0.8125 0.74107143 0.75 0.83928571 0.90178571 0.78571429 0.89285714] mean value: 0.8223830049261084 key: train_roc_auc value: [0.94137022 0.93412385 0.93133395 0.92398101 0.92791802 0.93868689 0.9426239 0.94997684 0.93943284 0.94140135] mean value: 0.9370848871745019 key: test_jcc value: [1. 0.5 0.44444444 0.625 0.41666667 0.5 0.6 0.7 0.57142857 0.66666667] mean value: 0.6024206349206349 key: train_jcc value: [0.84931507 0.83561644 0.82191781 0.80821918 0.83098592 0.83561644 0.85915493 0.87323944 0.83783784 0.84931507] mean value: 0.8401218119527979 MCC on Blind test: 0.84 Accuracy on Blind test: 0.94 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.24777889 0.24893379 0.25687099 0.27457643 0.29407358 0.27596569 0.33060312 0.30342221 0.28261089 0.29954863] mean value: 0.281438422203064 key: score_time value: [0.025563 0.02152777 0.02327323 0.02313852 0.02621269 0.02568793 0.02634549 0.02385783 0.03024006 0.02442074] mean value: 0.025026726722717284 key: test_mcc value: [1. 0.6144869 0.55814043 0.75032247 0.46291005 0.66143783 0.67857143 0.77151675 0.71842121 0.7484552 ] mean value: 0.6964262266368373 key: train_mcc value: [0.9358192 0.88757529 0.87718604 0.93513953 0.88588911 0.88708251 0.94407133 0.94407133 0.88839586 0.89735962] mean value: 0.9082589838760209 key: test_accuracy value: [1. 0.88888889 0.86111111 0.91666667 0.80555556 0.88888889 0.88888889 0.91666667 0.91428571 0.91428571] mean value: 0.8995238095238095 key: train_accuracy value: [0.97826087 0.96273292 0.95962733 0.97826087 0.96273292 0.96273292 0.98136646 0.98136646 0.9628483 0.96594427] mean value: 0.9695873315001058 key: test_fscore value: [1. 0.66666667 0.61538462 0.8 0.58823529 0.66666667 0.75 0.82352941 0.72727273 0.8 ] mean value: 0.7437755381873029 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:107: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:110: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.94964029 0.91044776 0.90225564 0.94890511 0.90769231 0.91044776 0.95588235 0.95588235 0.91176471 0.91851852] mean value: 0.9271436796720172 key: test_precision value: [1. 0.8 0.8 0.85714286 0.55555556 1. 0.75 0.77777778 1. 0.75 ] mean value: 0.829047619047619 key: train_precision value: [0.94285714 0.93846154 0.92307692 0.94202899 0.9516129 0.92424242 0.95588235 0.95588235 0.92537313 0.93939394] mean value: 0.9398811696975732 key: test_recall value: [1. 0.57142857 0.5 0.75 0.625 0.5 0.75 0.875 0.57142857 0.85714286] mean value: 0.7 key: train_recall value: [0.95652174 0.88405797 0.88235294 0.95588235 0.86764706 0.89705882 0.95588235 0.95588235 0.89855072 0.89855072] mean value: 0.9152387041773231 key: test_roc_auc value: [1. 0.76847291 0.73214286 0.85714286 0.74107143 0.75 0.83928571 0.90178571 0.78571429 0.89285714] mean value: 0.8268472906403941 key: train_roc_auc value: [0.97035573 0.93412385 0.93133395 0.97006716 0.92791802 0.93868689 0.97203566 0.97203566 0.93943284 0.94140135] mean value: 0.9497391118222522 key: test_jcc value: [1. 0.5 0.44444444 0.66666667 0.41666667 0.5 0.6 0.7 0.57142857 0.66666667] mean value: 0.6065873015873016 key: train_jcc value: [0.90410959 0.83561644 0.82191781 0.90277778 0.83098592 0.83561644 0.91549296 0.91549296 0.83783784 0.84931507] mean value: 0.8649162789067284 MCC on Blind test: 0.76 Accuracy on Blind test: 0.92 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03773212 0.03759193 0.03757524 0.03828263 0.0400517 0.03780508 0.03701949 0.04745054 0.04011869 0.03937149] mean value: 0.03929989337921143 key: score_time value: [0.01247263 0.01345062 0.01343799 0.01589584 0.01312828 0.01231027 0.01372886 0.01359797 0.01646304 0.01724863] mean value: 0.014173412322998047 key: test_mcc value: [0.82942474 0.75492611 0.75492611 0.92980296 0.75434227 0.89802651 0.75434227 0.82618439 0.85933785 0.93094934] mean value: 0.8292262532717305 key: train_mcc value: [0.9094503 0.921366 0.90138807 0.89754406 0.9212884 0.92916266 0.91738682 0.90945587 0.91732994 0.90562412] mean value: 0.9129996244909306 key: test_accuracy value: [0.9122807 0.87719298 0.87719298 0.96491228 0.875 0.94642857 0.875 0.91071429 0.92857143 0.96428571] mean value: 0.9131578947368421 key: train_accuracy value: [0.95463511 0.96055227 0.95069034 0.94871795 0.96062992 0.96456693 0.95866142 0.95472441 0.95866142 0.95275591] mean value: 0.9564595660749506 key: test_fscore value: [0.91525424 0.87719298 0.87719298 0.96551724 0.88135593 0.94339623 0.88135593 0.91525424 0.92592593 0.96551724] mean value: 0.9147962938994972 key: train_fscore value: [0.95427435 0.96015936 0.95069034 0.94820717 0.96047431 0.96442688 0.95841584 0.95463511 0.95857988 0.95238095] mean value: 0.956224419292093 key: test_precision value: [0.87096774 0.86206897 0.89285714 0.96551724 0.83870968 1. 0.83870968 0.87096774 0.96153846 0.93333333] mean value: 0.9034669983335167 key: train_precision value: [0.96385542 0.97177419 0.9488189 0.95582329 0.96428571 0.96825397 0.96414343 0.95652174 0.96047431 0.96 ] mean value: 0.9613950962310953 key: test_recall value: [0.96428571 0.89285714 0.86206897 0.96551724 0.92857143 0.89285714 0.92857143 0.96428571 0.89285714 1. ] mean value: 0.9291871921182266 key: train_recall value: [0.94488189 0.9488189 0.95256917 0.94071146 0.95669291 0.96062992 0.95275591 0.95275591 0.95669291 0.94488189] mean value: 0.951139086863154 key: test_roc_auc value: [0.91317734 0.87746305 0.87746305 0.96490148 0.875 0.94642857 0.875 0.91071429 0.92857143 0.96428571] mean value: 0.9133004926108375 key: train_roc_auc value: [0.95465438 0.96057546 0.95069403 0.94870219 0.96062992 0.96456693 0.95866142 0.95472441 0.95866142 0.95275591] mean value: 0.9564626062058448 key: test_jcc value: [0.84375 0.78125 0.78125 0.93333333 0.78787879 0.89285714 0.78787879 0.84375 0.86206897 0.93333333] mean value: 0.8447350350798627 key: train_jcc value: [0.91254753 0.92337165 0.90601504 0.90151515 0.92395437 0.93129771 0.92015209 0.91320755 0.92045455 0.90909091] mean value: 0.9161606540653082 MCC on Blind test: 0.72 Accuracy on Blind test: 0.9 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.94869375 0.87011933 1.19105005 0.99319768 0.97389174 0.92805409 0.93025017 0.93565726 0.83603239 1.05092835] mean value: 0.9657874822616577 key: score_time value: [0.01407504 0.01457548 0.01895332 0.02009797 0.01391673 0.01355195 0.01422977 0.01349163 0.01395845 0.01340699] mean value: 0.015025734901428223 key: test_mcc value: [0.82942474 0.8951918 0.86189955 0.96547546 0.85933785 0.93094934 0.93094934 0.93094934 0.89342711 0.93094934] mean value: 0.9028553849533496 key: train_mcc value: [0.98817342 0.98028353 0.98817323 0.98817323 0.99607071 0.99212598 0.98819663 1. 0.99212598 0.98032256] mean value: 0.9893645282093079 key: test_accuracy value: [0.9122807 0.94736842 0.92982456 0.98245614 0.92857143 0.96428571 0.96428571 0.96428571 0.94642857 0.96428571] mean value: 0.9504072681704261 key: train_accuracy value: [0.99408284 0.99013807 0.99408284 0.99408284 0.9980315 0.99606299 0.99409449 1. 0.99606299 0.99015748] mean value: 0.9946796036590101 key: test_fscore value: [0.91525424 0.94545455 0.92857143 0.98305085 0.93103448 0.96296296 0.96551724 0.96551724 0.94736842 0.96551724] mean value: 0.9510248649683883 key: train_fscore value: [0.99408284 0.99017682 0.99405941 0.99405941 0.99802761 0.99606299 0.99408284 1. 0.99606299 0.99017682] mean value: 0.9946791724596361 key: test_precision value: [0.87096774 0.96296296 0.96296296 0.96666667 0.9 1. 0.93333333 0.93333333 0.93103448 0.93333333] mean value: 0.9394594817286697 key: train_precision value: [0.99604743 0.98823529 0.99603175 0.99603175 1. 0.99606299 0.99604743 1. 0.99606299 0.98823529] mean value: 0.9952754926210834 key: test_recall value: [0.96428571 0.92857143 0.89655172 1. 0.96428571 0.92857143 1. 1. 0.96428571 1. ] mean value: 0.9646551724137931 key: train_recall value: [0.99212598 0.99212598 0.99209486 0.99209486 0.99606299 0.99606299 0.99212598 1. 0.99606299 0.99212598] mean value: 0.9940882636705985 key: test_roc_auc value: [0.91317734 0.94704433 0.93041872 0.98214286 0.92857143 0.96428571 0.96428571 0.96428571 0.94642857 0.96428571] mean value: 0.9504926108374385 key: train_roc_auc value: [0.99408671 0.99013414 0.99407893 0.99407893 0.9980315 0.99606299 0.99409449 1. 0.99606299 0.99015748] mean value: 0.9946788148517008 key: test_jcc value: [0.84375 0.89655172 0.86666667 0.96666667 0.87096774 0.92857143 0.93333333 0.93333333 0.9 0.93333333] mean value: 0.9073174227978177 key: train_jcc value: [0.98823529 0.98054475 0.98818898 0.98818898 0.99606299 0.99215686 0.98823529 1. 0.99215686 0.98054475] mean value: 0.9894314752770804 MCC on Blind test: 0.81 Accuracy on Blind test: 0.93 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01455855 0.01241016 0.01061916 0.01032066 0.01021433 0.00997758 0.01221704 0.01006413 0.01007271 0.01009989] mean value: 0.011055421829223634 key: score_time value: [0.01239634 0.00947404 0.00947428 0.00910234 0.00912118 0.00902677 0.00919771 0.00907707 0.00964546 0.00893784] mean value: 0.009545302391052246 key: test_mcc value: [0.47713554 0.553659 0.75462449 0.54592083 0.60753044 0.64285714 0.5118907 0.67082039 0.39310793 0.5118907 ] mean value: 0.5669437167779643 key: train_mcc value: [0.61830137 0.59162207 0.59295071 0.61416745 0.67031032 0.64585416 0.65661014 0.65225378 0.62955117 0.6032316 ] mean value: 0.6274852770592964 key: test_accuracy value: [0.73684211 0.77192982 0.87719298 0.77192982 0.80357143 0.82142857 0.75 0.82142857 0.69642857 0.75 ] mean value: 0.7800751879699248 key: train_accuracy value: [0.80473373 0.78303748 0.79487179 0.80473373 0.83464567 0.81889764 0.82480315 0.82283465 0.81299213 0.7992126 ] mean value: 0.8100762552609918 key: test_fscore value: [0.74576271 0.78688525 0.88135593 0.78688525 0.8 0.82142857 0.77419355 0.84375 0.70175439 0.77419355] mean value: 0.7916209190038752 key: train_fscore value: [0.82032668 0.81099656 0.80451128 0.81564246 0.82995951 0.83211679 0.83669725 0.83455882 0.82242991 0.81111111] mean value: 0.821835037001602 key: test_precision value: [0.70967742 0.72727273 0.86666667 0.75 0.81481481 0.82142857 0.70588235 0.75 0.68965517 0.70588235] mean value: 0.7541280077833765 key: train_precision value: [0.76094276 0.7195122 0.76702509 0.77112676 0.85416667 0.7755102 0.78350515 0.78275862 0.78291815 0.76573427] mean value: 0.7763199867511414 key: test_recall value: [0.78571429 0.85714286 0.89655172 0.82758621 0.78571429 0.82142857 0.85714286 0.96428571 0.71428571 0.85714286] mean value: 0.8366995073891625 key: train_recall value: [0.88976378 0.92913386 0.8458498 0.86561265 0.80708661 0.8976378 0.8976378 0.89370079 0.86614173 0.86220472] mean value: 0.8754769537207059 key: test_roc_auc value: [0.73768473 0.77339901 0.87684729 0.77093596 0.80357143 0.82142857 0.75 0.82142857 0.69642857 0.75 ] mean value: 0.7801724137931034 key: train_roc_auc value: [0.80456568 0.78274875 0.79497215 0.80485357 0.83464567 0.81889764 0.82480315 0.82283465 0.81299213 0.7992126 ] mean value: 0.8100525971802932 key: test_jcc value: [0.59459459 0.64864865 0.78787879 0.64864865 0.66666667 0.6969697 0.63157895 0.72972973 0.54054054 0.63157895] mean value: 0.6576835208414156 key: train_jcc value: [0.69538462 0.68208092 0.67295597 0.68867925 0.70934256 0.7125 0.7192429 0.71608833 0.6984127 0.68224299] mean value: 0.6976930240270341 MCC on Blind test: 0.2 Accuracy on Blind test: 0.67 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01165748 0.01036167 0.0107286 0.0104363 0.01048374 0.01035953 0.0104208 0.01058149 0.01036954 0.01047564] mean value: 0.010587477684020996 key: score_time value: [0.00932479 0.00897956 0.00896621 0.00899768 0.00909901 0.00911689 0.00907898 0.00910473 0.00895834 0.00902915] mean value: 0.009065532684326172 key: test_mcc value: [0.58562417 0.62473685 0.50927421 0.57973205 0.60753044 0.71428571 0.64951905 0.72168784 0.67900461 0.39310793] mean value: 0.6064502839116733 key: train_mcc value: [0.63864108 0.67343572 0.67495523 0.65362362 0.63188315 0.6387663 0.65228602 0.64665231 0.67097829 0.6472967 ] mean value: 0.6528518419173023 key: test_accuracy value: [0.78947368 0.80701754 0.75438596 0.78947368 0.80357143 0.85714286 0.82142857 0.85714286 0.83928571 0.69642857] mean value: 0.8015350877192983 key: train_accuracy value: [0.81854043 0.83629191 0.83629191 0.82642998 0.81496063 0.81889764 0.82480315 0.82283465 0.83464567 0.82283465] mean value: 0.8256530618583919 key: test_fscore value: [0.8 0.81967213 0.76666667 0.8 0.80701754 0.85714286 0.83333333 0.86666667 0.83636364 0.69090909] mean value: 0.8077771926089441 key: train_fscore value: [0.82509506 0.84069098 0.84250474 0.83011583 0.8219697 0.82375479 0.83239171 0.82758621 0.84030418 0.82889734] mean value: 0.8313310537668297 key: test_precision value: [0.75 0.75757576 0.74193548 0.77419355 0.79310345 0.85714286 0.78125 0.8125 0.85185185 0.7037037 ] mean value: 0.7823256650808097 key: train_precision value: [0.79779412 0.82022472 0.81021898 0.81132075 0.7919708 0.80223881 0.79783394 0.80597015 0.8125 0.80147059] mean value: 0.8051542850964287 key: test_recall value: [0.85714286 0.89285714 0.79310345 0.82758621 0.82142857 0.85714286 0.89285714 0.92857143 0.82142857 0.67857143] mean value: 0.8370689655172414 key: train_recall value: [0.85433071 0.86220472 0.87747036 0.84980237 0.85433071 0.84645669 0.87007874 0.8503937 0.87007874 0.85826772] mean value: 0.8593414459556192 key: test_roc_auc value: [0.79064039 0.80849754 0.75369458 0.7887931 0.80357143 0.85714286 0.82142857 0.85714286 0.83928571 0.69642857] mean value: 0.8016625615763546 key: train_roc_auc value: [0.8184697 0.8362407 0.83637297 0.82647599 0.81496063 0.81889764 0.82480315 0.82283465 0.83464567 0.82283465] mean value: 0.8256535744296786 key: test_jcc value: [0.66666667 0.69444444 0.62162162 0.66666667 0.67647059 0.75 0.71428571 0.76470588 0.71875 0.52777778] mean value: 0.6801389362051127 key: train_jcc value: [0.70226537 0.72516556 0.72786885 0.70957096 0.6977492 0.70032573 0.71290323 0.70588235 0.72459016 0.70779221] mean value: 0.7114113624151682 MCC on Blind test: 0.31 Accuracy on Blind test: 0.73 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00955963 0.01076388 0.0108633 0.01068926 0.01065207 0.01071215 0.00975442 0.01087141 0.01065636 0.01067686] mean value: 0.010519933700561524 key: score_time value: [0.01791644 0.01392055 0.01368237 0.01361632 0.01391411 0.01670647 0.01601553 0.01230955 0.01352668 0.01577139] mean value: 0.014737939834594727 key: test_mcc value: [0.62473685 0.82490815 0.47519927 0.65018988 0.72168784 0.72168784 0.65814518 0.8660254 0.71611487 0.68250015] mean value: 0.6941195430259357 key: train_mcc value: [0.82324487 0.81065015 0.84223222 0.79510329 0.80709287 0.80337378 0.79936749 0.79163927 0.81142619 0.83148876] mean value: 0.8115618890001493 key: test_accuracy value: [0.80701754 0.9122807 0.73684211 0.8245614 0.85714286 0.85714286 0.82142857 0.92857143 0.85714286 0.83928571] mean value: 0.844141604010025 key: train_accuracy value: [0.9112426 0.90532544 0.92110454 0.8974359 0.90354331 0.9015748 0.8996063 0.89566929 0.90551181 0.91535433] mean value: 0.9056368323782013 key: test_fscore value: [0.81967213 0.90909091 0.75409836 0.83333333 0.86666667 0.86666667 0.83870968 0.93333333 0.85185185 0.83018868] mean value: 0.8503611609410677 key: train_fscore value: [0.9132948 0.90551181 0.92063492 0.8984375 0.90335306 0.90272374 0.9005848 0.89708738 0.90697674 0.91714836] mean value: 0.9065753102337704 key: test_precision value: [0.75757576 0.92592593 0.71875 0.80645161 0.8125 0.8125 0.76470588 0.875 0.88461538 0.88 ] mean value: 0.8238024563373235 key: train_precision value: [0.89433962 0.90551181 0.92430279 0.88803089 0.90513834 0.89230769 0.89189189 0.88505747 0.89312977 0.89811321] mean value: 0.8977823484465078 key: test_recall value: [0.89285714 0.89285714 0.79310345 0.86206897 0.92857143 0.92857143 0.92857143 1. 0.82142857 0.78571429] mean value: 0.8833743842364532 key: train_recall value: [0.93307087 0.90551181 0.91699605 0.90909091 0.9015748 0.91338583 0.90944882 0.90944882 0.92125984 0.93700787] mean value: 0.9156795617939062 key: test_roc_auc value: [0.80849754 0.91194581 0.73583744 0.82389163 0.85714286 0.85714286 0.82142857 0.92857143 0.85714286 0.83928571] mean value: 0.844088669950739 key: train_roc_auc value: [0.91119946 0.90532508 0.92109645 0.89745884 0.90354331 0.9015748 0.8996063 0.89566929 0.90551181 0.91535433] mean value: 0.9056339671967881 key: test_jcc value: [0.69444444 0.83333333 0.60526316 0.71428571 0.76470588 0.76470588 0.72222222 0.875 0.74193548 0.70967742] mean value: 0.742557354011214 key: train_jcc value: [0.84042553 0.82733813 0.85294118 0.81560284 0.82374101 0.82269504 0.81914894 0.81338028 0.82978723 0.84697509] mean value: 0.8292035258287433 MCC on Blind test: 0.4 Accuracy on Blind test: 0.8 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.021698 0.0257144 0.02186465 0.02199745 0.02193356 0.02225208 0.02210999 0.02229166 0.02531195 0.02225661] mean value: 0.022743034362792968 key: score_time value: [0.01183033 0.01237917 0.01338649 0.01215816 0.01205087 0.01202941 0.01245189 0.01203823 0.01343703 0.01191998] mean value: 0.012368154525756837 key: test_mcc value: [0.79778885 0.68472906 0.68736396 0.9321832 0.75047877 0.82195294 0.73127242 0.73127242 0.75434227 0.85933785] mean value: 0.7750721760268345 key: train_mcc value: [0.83878121 0.84285233 0.85486038 0.8349816 0.8355787 0.84662074 0.83123063 0.84662074 0.83630655 0.8543903 ] mean value: 0.8422223192370651 key: test_accuracy value: [0.89473684 0.84210526 0.84210526 0.96491228 0.875 0.91071429 0.85714286 0.85714286 0.875 0.92857143] mean value: 0.8847431077694236 key: train_accuracy value: [0.91913215 0.92110454 0.9270217 0.91715976 0.91732283 0.92322835 0.91535433 0.92322835 0.91732283 0.92716535] mean value: 0.9208040193200702 key: test_fscore value: [0.9 0.84210526 0.85245902 0.96428571 0.87719298 0.90909091 0.87096774 0.87096774 0.86792453 0.93103448] mean value: 0.8886028380315576 key: train_fscore value: [0.92069632 0.92277992 0.92843327 0.91860465 0.91923077 0.92397661 0.91682785 0.92397661 0.91984733 0.92759295] mean value: 0.9221966289590753 key: test_precision value: [0.84375 0.82758621 0.8125 1. 0.86206897 0.92592593 0.79411765 0.79411765 0.92 0.9 ] mean value: 0.8680066392457366 key: train_precision value: [0.90494297 0.90530303 0.90909091 0.90114068 0.89849624 0.91505792 0.90114068 0.91505792 0.89259259 0.92217899] mean value: 0.9065001925631475 key: test_recall value: [0.96428571 0.85714286 0.89655172 0.93103448 0.89285714 0.89285714 0.96428571 0.96428571 0.82142857 0.96428571] mean value: 0.9149014778325123 key: train_recall value: [0.93700787 0.94094488 0.9486166 0.93675889 0.94094488 0.93307087 0.93307087 0.93307087 0.9488189 0.93307087] mean value: 0.9385375494071146 key: test_roc_auc value: [0.89593596 0.84236453 0.841133 0.96551724 0.875 0.91071429 0.85714286 0.85714286 0.875 0.92857143] mean value: 0.8848522167487685 key: train_roc_auc value: [0.91909682 0.92106533 0.92706421 0.91719834 0.91732283 0.92322835 0.91535433 0.92322835 0.91732283 0.92716535] mean value: 0.9208046746133018 key: test_jcc value: [0.81818182 0.72727273 0.74285714 0.93103448 0.78125 0.83333333 0.77142857 0.77142857 0.76666667 0.87096774] mean value: 0.8014421055862936 key: train_jcc value: [0.85304659 0.85663082 0.86642599 0.84946237 0.85053381 0.85869565 0.84642857 0.85869565 0.85159011 0.8649635 ] mean value: 0.8556473070988301 MCC on Blind test: 0.68 Accuracy on Blind test: 0.89 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [2.07555914 2.03910518 1.99554539 1.15945792 2.01181102 2.08725309 2.18683028 2.63542819 2.05218887 2.24368691] mean value: 2.0486865997314454 key: score_time value: [0.01258802 0.0125947 0.01385498 0.01252556 0.02199531 0.01386309 0.0126183 0.05426383 0.01399326 0.01492596] mean value: 0.018322300910949708 key: test_mcc value: [0.76689254 0.82490815 0.85960591 1. 0.78772636 0.89342711 0.8660254 0.89802651 0.8660254 0.93094934] mean value: 0.8693586722821179 key: train_mcc value: [0.99606293 0.99606293 0.99606299 0.98425123 1. 0.99607071 1. 1. 0.99607071 0.99607071] mean value: 0.9960652223608333 key: test_accuracy value: [0.87719298 0.9122807 0.92982456 1. 0.89285714 0.94642857 0.92857143 0.94642857 0.92857143 0.96428571] mean value: 0.9326441102756893 key: train_accuracy value: [0.99802761 0.99802761 0.99802761 0.99211045 1. 0.9980315 1. 1. 0.9980315 0.9980315 ] mean value: 0.9980287782074578 key: test_fscore value: [0.8852459 0.90909091 0.93103448 1. 0.89655172 0.94545455 0.93333333 0.94915254 0.92307692 0.96551724] mean value: 0.9338457603243798 key: train_fscore value: [0.99803536 0.99803536 0.99802761 0.99206349 1. 0.99803536 1. 1. 0.99803536 0.99803536] mean value: 0.9980267922764523 key: test_precision value: [0.81818182 0.92592593 0.93103448 1. 0.86666667 0.96296296 0.875 0.90322581 1. 0.93333333] mean value: 0.921633099628094 key: train_precision value: [0.99607843 0.99607843 0.99606299 0.99601594 1. 0.99607843 1. 1. 0.99607843 0.99607843] mean value: 0.9972471085243709 key: test_recall value: [0.96428571 0.89285714 0.93103448 1. 0.92857143 0.92857143 1. 1. 0.85714286 1. ] mean value: 0.9502463054187192 key: train_recall value: [1. 1. 1. 0.98814229 1. 1. 1. 1. 1. 1. ] mean value: 0.9988142292490119 key: test_roc_auc value: [0.87869458 0.91194581 0.92980296 1. 0.89285714 0.94642857 0.92857143 0.94642857 0.92857143 0.96428571] mean value: 0.9327586206896552 key: train_roc_auc value: [0.99802372 0.99802372 0.9980315 0.99210264 1. 0.9980315 1. 1. 0.9980315 0.9980315 ] mean value: 0.998027605739006 key: test_jcc value: [0.79411765 0.83333333 0.87096774 1. 0.8125 0.89655172 0.875 0.90322581 0.85714286 0.93333333] mean value: 0.8776172443393375 key: train_jcc value: [0.99607843 0.99607843 0.99606299 0.98425197 1. 0.99607843 1. 1. 0.99607843 0.99607843] mean value: 0.9960707117492666 MCC on Blind test: 0.77 Accuracy on Blind test: 0.92 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.06019211 0.02817774 0.03345776 0.03131986 0.03362656 0.03552794 0.03400302 0.03139901 0.03243303 0.03354049] mean value: 0.035367751121521 key: score_time value: [0.01245236 0.00913262 0.00972891 0.00909996 0.01145601 0.01146054 0.00974631 0.01045251 0.00957561 0.00912237] mean value: 0.010222721099853515 key: test_mcc value: [0.93202124 0.79110556 0.85960591 0.92980296 0.70082556 1. 0.85933785 0.89802651 0.93094934 0.93094934] mean value: 0.8832624260804833 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96491228 0.89473684 0.92982456 0.96491228 0.83928571 1. 0.92857143 0.94642857 0.96428571 0.96428571] mean value: 0.9397243107769424 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96296296 0.88888889 0.93103448 0.96551724 0.85714286 1. 0.93103448 0.94915254 0.96551724 0.96296296] mean value: 0.9414213662606415 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.92307692 0.93103448 0.96551724 0.77142857 1. 0.9 0.90322581 0.93333333 1. ] mean value: 0.9327616358428372 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.92857143 0.85714286 0.93103448 0.96551724 0.96428571 1. 0.96428571 1. 1. 0.92857143] mean value: 0.9539408866995074 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96428571 0.89408867 0.92980296 0.96490148 0.83928571 1. 0.92857143 0.94642857 0.96428571 0.96428571] mean value: 0.9395935960591133 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.92857143 0.8 0.87096774 0.93333333 0.75 1. 0.87096774 0.90322581 0.93333333 0.92857143] mean value: 0.8918970814132104 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.87 Accuracy on Blind test: 0.96 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.12719536 0.13101172 0.12918282 0.12430263 0.12717915 0.12785411 0.12964702 0.13043404 0.13283753 0.13035512] mean value: 0.12899994850158691 key: score_time value: [0.01965737 0.01963568 0.01971865 0.0194478 0.01989031 0.01939631 0.01996708 0.02017188 0.0194571 0.01910806] mean value: 0.019645023345947265 key: test_mcc value: [0.92980296 0.85960591 0.71921182 0.96551724 0.89342711 0.85714286 0.8660254 0.93094934 0.83484711 0.92857143] mean value: 0.8785101179086021 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96491228 0.92982456 0.85964912 0.98245614 0.94642857 0.92857143 0.92857143 0.96428571 0.91071429 0.96428571] mean value: 0.9379699248120301 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96428571 0.92857143 0.86206897 0.98245614 0.94736842 0.92857143 0.93333333 0.96551724 0.90196078 0.96428571] mean value: 0.9378419171661405 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96428571 0.92857143 0.86206897 1. 0.93103448 0.92857143 0.875 0.93333333 1. 0.96428571] mean value: 0.9387151067323481 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.92857143 0.86206897 0.96551724 0.96428571 0.92857143 1. 1. 0.82142857 0.96428571] mean value: 0.9399014778325123 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96490148 0.92980296 0.85960591 0.98275862 0.94642857 0.92857143 0.92857143 0.96428571 0.91071429 0.96428571] mean value: 0.9379926108374385 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.93103448 0.86666667 0.75757576 0.96551724 0.9 0.86666667 0.875 0.93333333 0.82142857 0.93103448] mean value: 0.8848257202567548 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.69 Accuracy on Blind test: 0.9 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01180434 0.01166081 0.01208377 0.01183748 0.011832 0.01188374 0.01314521 0.0108161 0.01178122 0.01097274] mean value: 0.011781740188598632 key: score_time value: [0.00960636 0.00973678 0.0089736 0.00985813 0.00992084 0.00999618 0.00909567 0.00982451 0.0099051 0.00940228] mean value: 0.009631943702697755 key: test_mcc value: [0.50927421 0.54377353 0.59060008 0.7257422 0.5728919 0.42857143 0.75434227 0.64450339 0.39513166 0.57142857] mean value: 0.5736259244504346 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.75438596 0.77192982 0.78947368 0.85964912 0.78571429 0.71428571 0.875 0.82142857 0.69642857 0.78571429] mean value: 0.7854010025062657 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.74074074 0.76363636 0.8125 0.87096774 0.77777778 0.71428571 0.88135593 0.82758621 0.71186441 0.78571429] mean value: 0.788642916996997 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.76923077 0.77777778 0.74285714 0.81818182 0.80769231 0.71428571 0.83870968 0.8 0.67741935 0.78571429] mean value: 0.773186884799788 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.71428571 0.75 0.89655172 0.93103448 0.75 0.71428571 0.92857143 0.85714286 0.75 0.78571429] mean value: 0.8077586206896552 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.75369458 0.77155172 0.78756158 0.85837438 0.78571429 0.71428571 0.875 0.82142857 0.69642857 0.78571429] mean value: 0.7849753694581281 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.58823529 0.61764706 0.68421053 0.77142857 0.63636364 0.55555556 0.78787879 0.70588235 0.55263158 0.64705882] mean value: 0.6546892185901474 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.21 Accuracy on Blind test: 0.72 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [2.09243822 1.99595952 2.03226089 1.98863244 2.03522563 2.05493855 2.07914615 1.96964455 2.04330802 2.02386498] mean value: 2.031541895866394 key: score_time value: [0.10404348 0.10880017 0.10026383 0.09849286 0.1004591 0.10068846 0.10119557 0.10098815 0.10136032 0.09345913] mean value: 0.10097510814666748 key: test_mcc value: [1. 0.8951918 0.89988258 1. 0.89342711 0.96490128 0.93094934 0.93094934 0.92857143 1. ] mean value: 0.9443872875319015 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.94736842 0.94736842 1. 0.94642857 0.98214286 0.96428571 0.96428571 0.96428571 1. ] mean value: 0.9716165413533835 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.94545455 0.94545455 1. 0.94736842 0.98181818 0.96551724 0.96551724 0.96428571 1. ] mean value: 0.9715415890824239 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96296296 1. 1. 0.93103448 1. 0.93333333 0.93333333 0.96428571 1. ] mean value: 0.9724949826673964 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.92857143 0.89655172 1. 0.96428571 0.96428571 1. 1. 0.96428571 1. ] mean value: 0.9717980295566503 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.94704433 0.94827586 1. 0.94642857 0.98214286 0.96428571 0.96428571 0.96428571 1. ] mean value: 0.9716748768472907 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.89655172 0.89655172 1. 0.9 0.96428571 0.93333333 0.93333333 0.93103448 1. ] mean value: 0.9455090311986863 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.79 Accuracy on Blind test: 0.93 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.97834277 1.00785446 0.99240088 0.98267341 1.02293301 0.96102357 0.94727778 0.99126816 0.98970008 1.04643464] mean value: 0.9919908761978149 key: score_time value: [0.19152308 0.2427218 0.24201584 0.20301509 0.25864029 0.18198228 0.2727077 0.23134995 0.22634912 0.265769 ] mean value: 0.23160741329193116 key: test_mcc value: [1. 0.8951918 0.9321832 1. 0.93094934 0.96490128 0.93094934 0.93094934 0.92857143 1. ] mean value: 0.9513695720675288 key: train_mcc value: [0.9685613 0.97645211 0.97645357 0.97245522 0.98032256 0.96862405 0.97250878 0.98032256 0.97250878 0.96862405] mean value: 0.9736832978955833 key: test_accuracy value: [1. 0.94736842 0.96491228 1. 0.96428571 0.98214286 0.96428571 0.96428571 0.96428571 1. ] mean value: 0.97515664160401 key: train_accuracy value: [0.98422091 0.98816568 0.98816568 0.98619329 0.99015748 0.98425197 0.98622047 0.99015748 0.98622047 0.98425197] mean value: 0.9868005404649863 key: test_fscore value: [1. 0.94545455 0.96428571 1. 0.96551724 0.98181818 0.96551724 0.96551724 0.96428571 1. ] mean value: 0.9752395879982088 key: train_fscore value: [0.984375 0.98828125 0.98823529 0.98624754 0.99017682 0.984375 0.98630137 0.99017682 0.98630137 0.984375 ] mean value: 0.98688454626256 key: test_precision value: [1. 0.96296296 1. 1. 0.93333333 1. 0.93333333 0.93333333 0.96428571 1. ] mean value: 0.9727248677248678 key: train_precision value: [0.97674419 0.98062016 0.98054475 0.98046875 0.98823529 0.97674419 0.98054475 0.98823529 0.98054475 0.97674419] mean value: 0.9809426292658725 key: test_recall value: [1. 0.92857143 0.93103448 1. 1. 0.96428571 1. 1. 0.96428571 1. ] mean value: 0.9788177339901478 key: train_recall value: [0.99212598 0.99606299 0.99604743 0.99209486 0.99212598 0.99212598 0.99212598 0.99212598 0.99212598 0.99212598] mean value: 0.9929087174379883 key: test_roc_auc value: [1. 0.94704433 0.96551724 1. 0.96428571 0.98214286 0.96428571 0.96428571 0.96428571 1. ] mean value: 0.9751847290640394 key: train_roc_auc value: [0.98420528 0.98815007 0.9881812 0.98620491 0.99015748 0.98425197 0.98622047 0.99015748 0.98622047 0.98425197] mean value: 0.9868001307148859 key: test_jcc value: [1. 0.89655172 0.93103448 1. 0.93333333 0.96428571 0.93333333 0.93333333 0.93103448 1. ] mean value: 0.9522906403940887 key: train_jcc value: [0.96923077 0.97683398 0.97674419 0.97286822 0.98054475 0.96923077 0.97297297 0.98054475 0.97297297 0.96923077] mean value: 0.9741174127736429 MCC on Blind test: 0.83 Accuracy on Blind test: 0.94 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0242126 0.01131368 0.01130939 0.01131654 0.01126266 0.01142788 0.01134253 0.01130843 0.01151514 0.01075053] mean value: 0.012575936317443848 key: score_time value: [0.00987244 0.00902605 0.00979638 0.0095818 0.00964236 0.00970507 0.00961256 0.00966287 0.00970745 0.00935006] mean value: 0.009595704078674317 key: test_mcc value: [0.58562417 0.62473685 0.50927421 0.57973205 0.60753044 0.71428571 0.64951905 0.72168784 0.67900461 0.39310793] mean value: 0.6064502839116733 key: train_mcc value: [0.63864108 0.67343572 0.67495523 0.65362362 0.63188315 0.6387663 0.65228602 0.64665231 0.67097829 0.6472967 ] mean value: 0.6528518419173023 key: test_accuracy value: [0.78947368 0.80701754 0.75438596 0.78947368 0.80357143 0.85714286 0.82142857 0.85714286 0.83928571 0.69642857] mean value: 0.8015350877192983 key: train_accuracy value: [0.81854043 0.83629191 0.83629191 0.82642998 0.81496063 0.81889764 0.82480315 0.82283465 0.83464567 0.82283465] mean value: 0.8256530618583919 key: test_fscore value: [0.8 0.81967213 0.76666667 0.8 0.80701754 0.85714286 0.83333333 0.86666667 0.83636364 0.69090909] mean value: 0.8077771926089441 key: train_fscore value: [0.82509506 0.84069098 0.84250474 0.83011583 0.8219697 0.82375479 0.83239171 0.82758621 0.84030418 0.82889734] mean value: 0.8313310537668297 key: test_precision value: [0.75 0.75757576 0.74193548 0.77419355 0.79310345 0.85714286 0.78125 0.8125 0.85185185 0.7037037 ] mean value: 0.7823256650808097 key: train_precision value: [0.79779412 0.82022472 0.81021898 0.81132075 0.7919708 0.80223881 0.79783394 0.80597015 0.8125 0.80147059] mean value: 0.8051542850964287 key: test_recall value: [0.85714286 0.89285714 0.79310345 0.82758621 0.82142857 0.85714286 0.89285714 0.92857143 0.82142857 0.67857143] mean value: 0.8370689655172414 key: train_recall value: [0.85433071 0.86220472 0.87747036 0.84980237 0.85433071 0.84645669 0.87007874 0.8503937 0.87007874 0.85826772] mean value: 0.8593414459556192 key: test_roc_auc value: [0.79064039 0.80849754 0.75369458 0.7887931 0.80357143 0.85714286 0.82142857 0.85714286 0.83928571 0.69642857] mean value: 0.8016625615763546 key: train_roc_auc value: [0.8184697 0.8362407 0.83637297 0.82647599 0.81496063 0.81889764 0.82480315 0.82283465 0.83464567 0.82283465] mean value: 0.8256535744296786 key: test_jcc value: [0.66666667 0.69444444 0.62162162 0.66666667 0.67647059 0.75 0.71428571 0.76470588 0.71875 0.52777778] mean value: 0.6801389362051127 key: train_jcc value: [0.70226537 0.72516556 0.72786885 0.70957096 0.6977492 0.70032573 0.71290323 0.70588235 0.72459016 0.70779221] mean value: 0.7114113624151682 MCC on Blind test: 0.31 Accuracy on Blind test: 0.73 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.08987308 0.22393084 0.25971007 0.22161651 0.2970469 0.07107282 0.3908968 0.34707355 0.3672266 0.26015067] mean value: 0.25285978317260743 key: score_time value: [0.01140714 0.01223254 0.01125836 0.01227474 0.0113318 0.01112461 0.01194763 0.01308942 0.01288772 0.01307106] mean value: 0.012062501907348634 key: test_mcc value: [1. 0.82880708 0.96551724 0.96547546 0.89802651 1. 0.96490128 0.93094934 0.96490128 1. ] mean value: 0.9518578190858389 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.9122807 0.98245614 0.98245614 0.94642857 1. 0.98214286 0.96428571 0.98214286 1. ] mean value: 0.975219298245614 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.90566038 0.98245614 0.98305085 0.94915254 1. 0.98245614 0.96551724 0.98245614 1. ] mean value: 0.9750749429620941 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96 1. 0.96666667 0.90322581 1. 0.96551724 0.93333333 0.96551724 1. ] mean value: 0.9694260289210234 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.85714286 0.96551724 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9822660098522168 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.91133005 0.98275862 0.98214286 0.94642857 1. 0.98214286 0.96428571 0.98214286 1. ] mean value: 0.9751231527093597 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.82758621 0.96551724 0.96666667 0.90322581 1. 0.96551724 0.93333333 0.96551724 1. ] mean value: 0.9527363737486095 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.88 Accuracy on Blind test: 0.96 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.0668292 0.06841993 0.12328172 0.05418634 0.05178666 0.0750885 0.09460568 0.09279466 0.08649731 0.06908274] mean value: 0.07825727462768554 key: score_time value: [0.01650906 0.02038956 0.02707815 0.01239896 0.01989675 0.01733708 0.01960182 0.02041531 0.02011204 0.01233983] mean value: 0.01860785484313965 key: test_mcc value: [0.9321832 0.82512315 0.82490815 0.8951918 0.89342711 1. 0.85933785 0.82618439 1. 0.96490128] mean value: 0.9021256935269077 key: train_mcc value: [0.96055211 0.96844169 0.97239426 0.96844169 0.96850394 0.9645744 0.96850394 0.98425197 0.96850394 0.9645744 ] mean value: 0.968874234820875 key: test_accuracy value: [0.96491228 0.9122807 0.9122807 0.94736842 0.94642857 1. 0.92857143 0.91071429 1. 0.98214286] mean value: 0.9504699248120301 key: train_accuracy value: [0.98027613 0.98422091 0.98619329 0.98422091 0.98425197 0.98228346 0.98425197 0.99212598 0.98425197 0.98228346] mean value: 0.9844360061501188 key: test_fscore value: [0.96551724 0.9122807 0.91525424 0.94915254 0.94736842 1. 0.93103448 0.91525424 1. 0.98181818] mean value: 0.9517680045712283 key: train_fscore value: [0.98031496 0.98425197 0.98619329 0.98418972 0.98425197 0.98224852 0.98425197 0.99212598 0.98425197 0.98231827] mean value: 0.98443986279333 key: test_precision value: [0.93333333 0.89655172 0.9 0.93333333 0.93103448 1. 0.9 0.87096774 1. 1. ] mean value: 0.9365220615498703 key: train_precision value: [0.98031496 0.98425197 0.98425197 0.98418972 0.98425197 0.98418972 0.98425197 0.99212598 0.98425197 0.98039216] mean value: 0.9842472390904636 key: test_recall value: [1. 0.92857143 0.93103448 0.96551724 0.96428571 1. 0.96428571 0.96428571 1. 0.96428571] mean value: 0.9682266009852217 key: train_recall value: [0.98031496 0.98425197 0.98814229 0.98418972 0.98425197 0.98031496 0.98425197 0.99212598 0.98425197 0.98425197] mean value: 0.9846347763841773 key: test_roc_auc value: [0.96551724 0.91256158 0.91194581 0.94704433 0.94642857 1. 0.92857143 0.91071429 1. 0.98214286] mean value: 0.9504926108374385 key: train_roc_auc value: [0.98027606 0.98422085 0.98619713 0.98422085 0.98425197 0.98228346 0.98425197 0.99212598 0.98425197 0.98228346] mean value: 0.984436369860882 key: test_jcc value: [0.93333333 0.83870968 0.84375 0.90322581 0.9 1. 0.87096774 0.84375 1. 0.96428571] mean value: 0.9098022273425499 key: train_jcc value: [0.96138996 0.96899225 0.97276265 0.9688716 0.96899225 0.96511628 0.96899225 0.984375 0.96899225 0.96525097] mean value: 0.9693735439203892 MCC on Blind test: 0.79 Accuracy on Blind test: 0.93 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02307963 0.01162243 0.01094699 0.01106811 0.01101923 0.01119828 0.01114917 0.01126027 0.01035953 0.01032233] mean value: 0.012202596664428711 key: score_time value: [0.01033926 0.01000857 0.00924134 0.00996804 0.01003981 0.01006603 0.00951672 0.00953388 0.00911236 0.00968027] mean value: 0.009750628471374511 key: test_mcc value: [0.52204981 0.68850906 0.57881773 0.64901478 0.50128041 0.64285714 0.57735027 0.65814518 0.64285714 0.53605627] mean value: 0.599693779541295 key: train_mcc value: [0.60210948 0.6702837 0.66589861 0.59833978 0.6189214 0.59993353 0.65074202 0.63496646 0.6918185 0.59961602] mean value: 0.6332629515394039 key: test_accuracy value: [0.75438596 0.84210526 0.78947368 0.8245614 0.75 0.82142857 0.78571429 0.82142857 0.82142857 0.76785714] mean value: 0.7978383458646616 key: train_accuracy value: [0.80078895 0.83431953 0.83234714 0.79881657 0.80905512 0.7992126 0.82480315 0.81692913 0.84448819 0.7992126 ] mean value: 0.8159972976750687 key: test_fscore value: [0.77419355 0.84745763 0.79310345 0.82758621 0.74074074 0.82142857 0.8 0.83870968 0.82142857 0.77192982] mean value: 0.8036578216256797 key: train_fscore value: [0.80539499 0.84030418 0.83685221 0.8030888 0.81381958 0.80608365 0.82982792 0.82217973 0.85122411 0.80534351] mean value: 0.8214118676278633 key: test_precision value: [0.70588235 0.80645161 0.79310345 0.82758621 0.76923077 0.82142857 0.75 0.76470588 0.82142857 0.75862069] mean value: 0.7818438105112842 key: train_precision value: [0.78867925 0.8125 0.81343284 0.78490566 0.79400749 0.77941176 0.80669145 0.79925651 0.81588448 0.78148148] mean value: 0.7976250910229972 key: test_recall value: [0.85714286 0.89285714 0.79310345 0.82758621 0.71428571 0.82142857 0.85714286 0.92857143 0.82142857 0.78571429] mean value: 0.8299261083743842 key: train_recall value: [0.82283465 0.87007874 0.86166008 0.82213439 0.83464567 0.83464567 0.85433071 0.84645669 0.88976378 0.83070866] mean value: 0.8467259033332296 key: test_roc_auc value: [0.75615764 0.8429803 0.78940887 0.82450739 0.75 0.82142857 0.78571429 0.82142857 0.82142857 0.76785714] mean value: 0.7980911330049261 key: train_roc_auc value: [0.80074539 0.83424886 0.83240484 0.79886247 0.80905512 0.7992126 0.82480315 0.81692913 0.84448819 0.7992126 ] mean value: 0.8159962341663813 key: test_jcc value: [0.63157895 0.73529412 0.65714286 0.70588235 0.58823529 0.6969697 0.66666667 0.72222222 0.6969697 0.62857143] mean value: 0.6729533280616872 key: train_jcc value: [0.67419355 0.72459016 0.71947195 0.67096774 0.68608414 0.67515924 0.70915033 0.69805195 0.74098361 0.67412141] mean value: 0.6972774066672848 MCC on Blind test: 0.52 Accuracy on Blind test: 0.8 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02162695 0.02330804 0.02636027 0.02295399 0.0236361 0.06173849 0.02118683 0.02854466 0.02622795 0.0239768 ] mean value: 0.027956008911132812 key: score_time value: [0.01124668 0.01192689 0.01218081 0.01735902 0.01788497 0.0120585 0.01298404 0.01233673 0.0231607 0.01238585] mean value: 0.0143524169921875 key: test_mcc value: [0.86189955 0.82880708 0.86189955 0.89988258 0.79385662 1. 0.89802651 0.89342711 0.80439967 0.93094934] mean value: 0.8773147998171975 key: train_mcc value: [0.97636129 0.96450468 0.97239426 0.91875999 0.93470218 0.95322883 0.94970991 0.97649905 0.77972956 0.97250878] mean value: 0.9398398525421525 key: test_accuracy value: [0.92982456 0.9122807 0.92982456 0.94736842 0.89285714 1. 0.94642857 0.94642857 0.89285714 0.96428571] mean value: 0.9362155388471178 key: train_accuracy value: [0.98816568 0.98224852 0.98619329 0.95857988 0.96653543 0.97637795 0.97440945 0.98818898 0.87992126 0.98622047] mean value: 0.9686840920032925 key: test_fscore value: [0.93103448 0.90566038 0.92857143 0.94545455 0.9 1. 0.94915254 0.94736842 0.88 0.96551724] mean value: 0.9352759038947909 key: train_fscore value: [0.98823529 0.98224852 0.98619329 0.95723014 0.96749522 0.97674419 0.97495183 0.98809524 0.86474501 0.98630137] mean value: 0.9672240106699174 key: test_precision value: [0.9 0.96 0.96296296 1. 0.84375 1. 0.90322581 0.93103448 1. 0.93333333] mean value: 0.943430658550653 key: train_precision value: [0.984375 0.98418972 0.98425197 0.98739496 0.94052045 0.96183206 0.95471698 0.996 0.98984772 0.98054475] mean value: 0.9763673600922473 key: test_recall value: [0.96428571 0.85714286 0.89655172 0.89655172 0.96428571 1. 1. 0.96428571 0.78571429 1. ] mean value: 0.9328817733990148 key: train_recall value: [0.99212598 0.98031496 0.98814229 0.92885375 0.99606299 0.99212598 0.99606299 0.98031496 0.76771654 0.99212598] mean value: 0.9613846441131617 key: test_roc_auc value: [0.93041872 0.91133005 0.93041872 0.94827586 0.89285714 1. 0.94642857 0.94642857 0.89285714 0.96428571] mean value: 0.9363300492610838 key: train_roc_auc value: [0.98815785 0.98225234 0.98619713 0.95852137 0.96653543 0.97637795 0.97440945 0.98818898 0.87992126 0.98622047] mean value: 0.9686782235224549 key: test_jcc value: [0.87096774 0.82758621 0.86666667 0.89655172 0.81818182 1. 0.90322581 0.9 0.78571429 0.93333333] mean value: 0.8802227583317683 key: train_jcc value: [0.97674419 0.96511628 0.97276265 0.91796875 0.93703704 0.95454545 0.95112782 0.97647059 0.76171875 0.97297297] mean value: 0.9386464483370307 MCC on Blind test: 0.74 Accuracy on Blind test: 0.91 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01865149 0.02236271 0.02846265 0.02018642 0.02099609 0.01799417 0.01969862 0.01990604 0.02230525 0.02031541] mean value: 0.0210878849029541 key: score_time value: [0.02322531 0.0123446 0.01212907 0.01237273 0.01238465 0.01233506 0.01233649 0.01211786 0.01209664 0.02831006] mean value: 0.014965248107910157 key: test_mcc value: [0.56067321 0.82880708 0.66755025 0.93202124 0.71428571 0.74535599 0.60485838 0.85714286 0.96490128 0.93094934] mean value: 0.7806545352178498 key: train_mcc value: [0.54270333 0.98028353 0.80481374 0.9417201 0.97244848 0.75996798 0.81248429 0.92727605 0.94217971 0.95349515] mean value: 0.8637372357336665 key: test_accuracy value: [0.73684211 0.9122807 0.80701754 0.96491228 0.85714286 0.85714286 0.76785714 0.92857143 0.98214286 0.96428571] mean value: 0.8778195488721804 key: train_accuracy value: [0.72781065 0.99013807 0.89546351 0.9704142 0.98622047 0.86811024 0.8976378 0.96259843 0.97047244 0.97637795] mean value: 0.9245243752814922 key: test_fscore value: [0.78873239 0.90566038 0.76595745 0.96666667 0.85714286 0.83333333 0.8115942 0.92857143 0.98245614 0.96551724] mean value: 0.8805632088876222 key: train_fscore value: [0.78637771 0.99017682 0.88453159 0.97098646 0.98624754 0.8494382 0.90714286 0.96130346 0.97120921 0.97683398] mean value: 0.9284247832831198 key: test_precision value: [0.65116279 0.96 1. 0.93548387 0.85714286 1. 0.68292683 0.92857143 0.96551724 0.93333333] mean value: 0.8914138351360639 key: train_precision value: [0.64795918 0.98823529 0.98543689 0.95075758 0.98431373 0.9895288 0.83006536 0.99578059 0.94756554 0.95833333] mean value: 0.9277976294653208 key: test_recall value: [1. 0.85714286 0.62068966 1. 0.85714286 0.71428571 1. 0.92857143 1. 1. ] mean value: 0.8977832512315271 key: train_recall value: [1. 0.99212598 0.80237154 0.99209486 0.98818898 0.74409449 1. 0.92913386 0.99606299 0.99606299] mean value: 0.9440135694500638 key: test_roc_auc value: [0.74137931 0.91133005 0.81034483 0.96428571 0.85714286 0.85714286 0.76785714 0.92857143 0.98214286 0.96428571] mean value: 0.878448275862069 key: train_roc_auc value: [0.72727273 0.99013414 0.89528026 0.97045688 0.98622047 0.86811024 0.8976378 0.96259843 0.97047244 0.97637795] mean value: 0.9244561327067319 key: test_jcc value: [0.65116279 0.82758621 0.62068966 0.93548387 0.75 0.71428571 0.68292683 0.86666667 0.96551724 0.93333333] mean value: 0.79476523086677 key: train_jcc value: [0.64795918 0.98054475 0.79296875 0.94360902 0.97286822 0.73828125 0.83006536 0.9254902 0.94402985 0.95471698] mean value: 0.8730533557799736 MCC on Blind test: 0.62 Accuracy on Blind test: 0.82 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.25352716 0.23959851 0.23895645 0.23683834 0.23459578 0.23570085 0.23769593 0.24037766 0.24266458 0.23892522] mean value: 0.23988804817199708 key: score_time value: [0.01613116 0.01571035 0.01589203 0.01570082 0.01559377 0.01582003 0.01564693 0.01657271 0.01562572 0.01605487] mean value: 0.015874838829040526 key: test_mcc value: [0.96547546 0.82880708 0.9321832 1. 0.89802651 1. 0.92857143 0.93094934 0.96490128 1. ] mean value: 0.944891429609529 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.9122807 0.96491228 1. 0.94642857 1. 0.96428571 0.96428571 0.98214286 1. ] mean value: 0.9716791979949875 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98181818 0.90566038 0.96428571 1. 0.94915254 1. 0.96428571 0.96551724 0.98245614 1. ] mean value: 0.971317591185117 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96 1. 1. 0.90322581 1. 0.96428571 0.93333333 0.96551724 1. ] mean value: 0.9726362095449971 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96428571 0.85714286 0.93103448 1. 1. 1. 0.96428571 1. 1. 1. ] mean value: 0.9716748768472906 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98214286 0.91133005 0.96551724 1. 0.94642857 1. 0.96428571 0.96428571 0.98214286 1. ] mean value: 0.9716133004926109 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96428571 0.82758621 0.93103448 1. 0.90322581 1. 0.93103448 0.93333333 0.96551724 1. ] mean value: 0.9456017267863764 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.88 Accuracy on Blind test: 0.96 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.06545424 0.09734583 0.08027411 0.07696271 0.09823036 0.08253455 0.10306764 0.08882976 0.0632627 0.07575917] mean value: 0.08317210674285888 key: score_time value: [0.01944017 0.03101039 0.03404474 0.02634931 0.02568722 0.02294087 0.02933931 0.02022123 0.02197647 0.0271163 ] mean value: 0.025812602043151854 key: test_mcc value: [0.96547546 0.79110556 0.89988258 1. 0.79385662 1. 0.85714286 0.93094934 0.93094934 1. ] mean value: 0.9169361747244729 key: train_mcc value: [1. 0.99211042 0.98817342 0.98425123 1. 0.99607071 0.99607071 0.98819663 0.99607071 0.98819663] mean value: 0.9929140477452736 key: test_accuracy value: [0.98245614 0.89473684 0.94736842 1. 0.89285714 1. 0.92857143 0.96428571 0.96428571 1. ] mean value: 0.9574561403508772 key: train_accuracy value: [1. 0.99605523 0.99408284 0.99211045 1. 0.9980315 0.9980315 0.99409449 0.9980315 0.99409449] mean value: 0.9964531985276989 key: test_fscore value: [0.98181818 0.88888889 0.94545455 1. 0.9 1. 0.92857143 0.96551724 0.96551724 1. ] mean value: 0.9575767527491665 key: train_fscore value: [1. 0.99606299 0.99408284 0.99206349 1. 0.99802761 0.99803536 0.99410609 0.99803536 0.99410609] mean value: 0.9964519845500475 key: test_precision value: [1. 0.92307692 1. 1. 0.84375 1. 0.92857143 0.93333333 0.93333333 1. ] mean value: 0.9562065018315018 key: train_precision value: [1. 0.99606299 0.99212598 0.99601594 1. 1. 0.99607843 0.99215686 0.99607843 0.99215686] mean value: 0.9960675500868227 key: test_recall value: [0.96428571 0.85714286 0.89655172 1. 0.96428571 1. 0.92857143 1. 1. 1. ] mean value: 0.9610837438423645 key: train_recall value: [1. 0.99606299 0.99604743 0.98814229 1. 0.99606299 1. 0.99606299 1. 0.99606299] mean value: 0.9968441691824095 key: test_roc_auc value: [0.98214286 0.89408867 0.94827586 1. 0.89285714 1. 0.92857143 0.96428571 0.96428571 1. ] mean value: 0.9574507389162562 key: train_roc_auc value: [1. 0.99605521 0.99408671 0.99210264 1. 0.9980315 0.9980315 0.99409449 0.9980315 0.99409449] mean value: 0.9964528025893996 key: test_jcc value: [0.96428571 0.8 0.89655172 1. 0.81818182 1. 0.86666667 0.93333333 0.93333333 1. ] mean value: 0.9212352589938797 key: train_jcc value: [1. 0.99215686 0.98823529 0.98425197 1. 0.99606299 0.99607843 0.98828125 0.99607843 0.98828125] mean value: 0.9929426480237764 MCC on Blind test: 0.87 Accuracy on Blind test: 0.96 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.24165225 0.2639544 0.21907234 0.1663208 0.1660831 0.18476343 0.18354225 0.15284801 0.23790598 0.2857213 ] mean value: 0.21018638610839843 key: score_time value: [0.02633309 0.04548001 0.03008986 0.05320883 0.0402391 0.03054214 0.01558995 0.02765036 0.02605081 0.04492259] mean value: 0.034010672569274904 key: test_mcc value: [0.5149026 0.78940887 0.51851399 0.79778885 0.75047877 0.78571429 0.61706091 0.89802651 0.79385662 0.75047877] mean value: 0.7216230189409703 key: train_mcc value: [0.99606293 0.98817342 0.98817323 0.98425123 0.98819663 0.99607071 0.99212598 0.98819663 0.99212598 0.98819663] mean value: 0.9901573396651306 key: test_accuracy value: [0.75438596 0.89473684 0.75438596 0.89473684 0.875 0.89285714 0.80357143 0.94642857 0.89285714 0.875 ] mean value: 0.8583959899749374 key: train_accuracy value: [0.99802761 0.99408284 0.99408284 0.99211045 0.99409449 0.9980315 0.99606299 0.99409449 0.99606299 0.99409449] mean value: 0.9950744692416407 key: test_fscore value: [0.76666667 0.89285714 0.78125 0.88888889 0.87719298 0.89285714 0.81967213 0.94915254 0.88461538 0.87272727] mean value: 0.8625880154589062 key: train_fscore value: [0.99803536 0.99408284 0.99405941 0.99206349 0.99408284 0.99803536 0.99606299 0.99408284 0.99606299 0.99408284] mean value: 0.9950650970118321 key: test_precision value: [0.71875 0.89285714 0.71428571 0.96 0.86206897 0.89285714 0.75757576 0.90322581 0.95833333 0.88888889] mean value: 0.8548842751766834 key: train_precision value: [0.99607843 0.99604743 0.99603175 0.99601594 0.99604743 0.99607843 0.99606299 0.99604743 0.99606299 0.99604743] mean value: 0.996052025260395 key: test_recall value: [0.82142857 0.89285714 0.86206897 0.82758621 0.89285714 0.89285714 0.89285714 1. 0.82142857 0.85714286] mean value: 0.8761083743842365 key: train_recall value: [1. 0.99212598 0.99209486 0.98814229 0.99212598 1. 0.99606299 0.99212598 0.99606299 0.99212598] mean value: 0.994086707541004 key: test_roc_auc value: [0.75554187 0.89470443 0.75246305 0.89593596 0.875 0.89285714 0.80357143 0.94642857 0.89285714 0.875 ] mean value: 0.858435960591133 key: train_roc_auc value: [0.99802372 0.99408671 0.99407893 0.99210264 0.99409449 0.9980315 0.99606299 0.99409449 0.99606299 0.99409449] mean value: 0.9950732937038996 key: test_jcc value: [0.62162162 0.80645161 0.64102564 0.8 0.78125 0.80645161 0.69444444 0.90322581 0.79310345 0.77419355] mean value: 0.762176773601273 key: train_jcc value: [0.99607843 0.98823529 0.98818898 0.98425197 0.98823529 0.99607843 0.99215686 0.98823529 0.99215686 0.98823529] mean value: 0.9901852709587773 MCC on Blind test: 0.47 Accuracy on Blind test: 0.82 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.97148848 0.95016146 0.97649026 0.94490004 0.95638227 0.95632815 0.95845008 0.95048833 0.95981479 0.96253824] mean value: 0.9587042093276977 key: score_time value: [0.00969386 0.00933099 0.00937676 0.00944138 0.00976372 0.00949907 0.00940275 0.00956464 0.009552 0.00934672] mean value: 0.00949718952178955 key: test_mcc value: [0.93202124 0.82880708 0.9321832 1. 0.8660254 1. 0.89342711 0.93094934 0.96490128 0.96490128] mean value: 0.9313215939799506 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96491228 0.9122807 0.96491228 1. 0.92857143 1. 0.94642857 0.96428571 0.98214286 0.98214286] mean value: 0.9645676691729324 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96296296 0.90566038 0.96428571 1. 0.93333333 1. 0.94545455 0.96551724 0.98245614 0.98181818] mean value: 0.9641488496943416 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96 1. 1. 0.875 1. 0.96296296 0.93333333 0.96551724 1. ] mean value: 0.9696813537675607 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.92857143 0.85714286 0.93103448 1. 1. 1. 0.92857143 1. 1. 0.96428571] mean value: 0.9609605911330049 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96428571 0.91133005 0.96551724 1. 0.92857143 1. 0.94642857 0.96428571 0.98214286 0.98214286] mean value: 0.9644704433497537 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.92857143 0.82758621 0.93103448 1. 0.875 1. 0.89655172 0.93333333 0.96551724 0.96428571] mean value: 0.932188013136289 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.87 Accuracy on Blind test: 0.96 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03023362 0.03938198 0.03128219 0.03190351 0.03164005 0.03126192 0.03192115 0.03110099 0.03161693 0.0320549 ] mean value: 0.03223972320556641 key: score_time value: [0.01244426 0.01752877 0.01386118 0.01407957 0.01385164 0.01395226 0.01401639 0.01404047 0.01423621 0.01403999] mean value: 0.014205074310302735 key: test_mcc value: [0.8951918 0.8951918 0.93202124 1. 0.96490128 0.93094934 0.96490128 0.96490128 0.96490128 1. ] mean value: 0.9512959308288262 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94736842 0.94736842 0.96491228 1. 0.98214286 0.96428571 0.98214286 0.98214286 0.98214286 1. ] mean value: 0.975250626566416 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94545455 0.94545455 0.96666667 1. 0.98245614 0.96551724 0.98245614 0.98245614 0.98245614 1. ] mean value: 0.9752917560358576 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96296296 0.96296296 0.93548387 1. 0.96551724 0.93333333 0.96551724 0.96551724 0.96551724 1. ] mean value: 0.9656812095744243 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.92857143 0.92857143 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9857142857142858 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94704433 0.94704433 0.96428571 1. 0.98214286 0.96428571 0.98214286 0.98214286 0.98214286 1. ] mean value: 0.9751231527093597 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.89655172 0.89655172 0.93548387 1. 0.96551724 0.93333333 0.96551724 0.96551724 0.96551724 1. ] mean value: 0.9523989618094179 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: -0.1 Accuracy on Blind test: 0.76 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02729225 0.03505087 0.03882575 0.04039145 0.03900075 0.03893924 0.03912377 0.03932238 0.03904605 0.03894711] mean value: 0.037593960762023926 key: score_time value: [0.01924682 0.02053595 0.01905107 0.0189836 0.01907921 0.01899457 0.01899743 0.01894832 0.01898527 0.01905417] mean value: 0.01918764114379883 key: test_mcc value: [0.9321832 0.8951918 0.92980296 1. 0.85933785 1. 0.85933785 0.85933785 0.96490128 0.93094934] mean value: 0.9231042121808317 key: train_mcc value: [0.96450413 0.97239383 0.96847232 0.96847232 0.97244848 0.9645744 0.97244848 0.97637795 0.9645744 0.9645744 ] mean value: 0.9688840740520619 key: test_accuracy value: [0.96491228 0.94736842 0.96491228 1. 0.92857143 1. 0.92857143 0.92857143 0.98214286 0.96428571] mean value: 0.9609335839598998 key: train_accuracy value: [0.98224852 0.98619329 0.98422091 0.98422091 0.98622047 0.98228346 0.98622047 0.98818898 0.98228346 0.98228346] mean value: 0.9844363944151951 key: test_fscore value: [0.96551724 0.94545455 0.96551724 1. 0.93103448 1. 0.93103448 0.93103448 0.98181818 0.96551724] mean value: 0.961692789968652 key: train_fscore value: [0.98231827 0.98624754 0.98425197 0.98425197 0.98624754 0.98231827 0.98624754 0.98818898 0.98231827 0.98231827] mean value: 0.9844708630478165 key: test_precision value: [0.93333333 0.96296296 0.96551724 1. 0.9 1. 0.9 0.9 1. 0.93333333] mean value: 0.949514687100894 key: train_precision value: [0.98039216 0.98431373 0.98039216 0.98039216 0.98431373 0.98039216 0.98431373 0.98818898 0.98039216 0.98039216] mean value: 0.9823483094025012 key: test_recall value: [1. 0.92857143 0.96551724 1. 0.96428571 1. 0.96428571 0.96428571 0.96428571 1. ] mean value: 0.9751231527093596 key: train_recall value: [0.98425197 0.98818898 0.98814229 0.98814229 0.98818898 0.98425197 0.98818898 0.98818898 0.98425197 0.98425197] mean value: 0.9866048364507797 key: test_roc_auc value: [0.96551724 0.94704433 0.96490148 1. 0.92857143 1. 0.92857143 0.92857143 0.98214286 0.96428571] mean value: 0.960960591133005 key: train_roc_auc value: [0.98224456 0.98618935 0.98422863 0.98422863 0.98622047 0.98228346 0.98622047 0.98818898 0.98228346 0.98228346] mean value: 0.9844371479256793 key: test_jcc value: [0.93333333 0.89655172 0.93333333 1. 0.87096774 1. 0.87096774 0.87096774 0.96428571 0.93333333] mean value: 0.9273740664230097 key: train_jcc value: [0.96525097 0.97286822 0.96899225 0.96899225 0.97286822 0.96525097 0.97286822 0.9766537 0.96525097 0.96525097] mean value: 0.9694246704788737 MCC on Blind test: 0.81 Accuracy on Blind test: 0.93 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.37074566 0.28954816 0.30450511 0.2897861 0.29390836 0.33179474 0.34224701 0.31382322 0.29044604 0.29306006] mean value: 0.31198644638061523 key: score_time value: [0.01924872 0.01909328 0.01921964 0.01943707 0.01910305 0.01929712 0.01911283 0.01910567 0.01906586 0.01909971] mean value: 0.019178295135498048 key: test_mcc value: [0.9321832 0.8951918 0.92980296 1. 0.85933785 1. 0.85933785 0.85933785 0.96490128 0.93094934] mean value: 0.9231042121808317 key: train_mcc value: [0.96450413 0.97239383 0.96847232 0.96847232 0.97244848 0.9645744 0.97244848 0.97637795 0.9645744 0.9645744 ] mean value: 0.9688840740520619 key: test_accuracy value: [0.96491228 0.94736842 0.96491228 1. 0.92857143 1. 0.92857143 0.92857143 0.98214286 0.96428571] mean value: 0.9609335839598998 key: train_accuracy value: [0.98224852 0.98619329 0.98422091 0.98422091 0.98622047 0.98228346 0.98622047 0.98818898 0.98228346 0.98228346] mean value: 0.9844363944151951 key: test_fscore value: [0.96551724 0.94545455 0.96551724 1. 0.93103448 1. 0.93103448 0.93103448 0.98181818 0.96551724] mean value: 0.961692789968652 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:128: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:131: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.98231827 0.98624754 0.98425197 0.98425197 0.98624754 0.98231827 0.98624754 0.98818898 0.98231827 0.98231827] mean value: 0.9844708630478165 key: test_precision value: [0.93333333 0.96296296 0.96551724 1. 0.9 1. 0.9 0.9 1. 0.93333333] mean value: 0.949514687100894 key: train_precision value: [0.98039216 0.98431373 0.98039216 0.98039216 0.98431373 0.98039216 0.98431373 0.98818898 0.98039216 0.98039216] mean value: 0.9823483094025012 key: test_recall value: [1. 0.92857143 0.96551724 1. 0.96428571 1. 0.96428571 0.96428571 0.96428571 1. ] mean value: 0.9751231527093596 key: train_recall value: [0.98425197 0.98818898 0.98814229 0.98814229 0.98818898 0.98425197 0.98818898 0.98818898 0.98425197 0.98425197] mean value: 0.9866048364507797 key: test_roc_auc value: [0.96551724 0.94704433 0.96490148 1. 0.92857143 1. 0.92857143 0.92857143 0.98214286 0.96428571] mean value: 0.960960591133005 key: train_roc_auc value: [0.98224456 0.98618935 0.98422863 0.98422863 0.98622047 0.98228346 0.98622047 0.98818898 0.98228346 0.98228346] mean value: 0.9844371479256793 key: test_jcc value: [0.93333333 0.89655172 0.93333333 1. 0.87096774 1. 0.87096774 0.87096774 0.96428571 0.93333333] mean value: 0.9273740664230097 key: train_jcc value: [0.96525097 0.97286822 0.96899225 0.96899225 0.97286822 0.96525097 0.97286822 0.9766537 0.96525097 0.96525097] mean value: 0.9694246704788737 MCC on Blind test: 0.81 Accuracy on Blind test: 0.93 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03503633 0.04945874 0.03754044 0.03722906 0.03705621 0.03847361 0.03858423 0.03832555 0.03757358 0.03789806] mean value: 0.03871757984161377 key: score_time value: [0.01331329 0.01399422 0.01336384 0.01483345 0.01530671 0.01369762 0.01375341 0.01342893 0.01370549 0.01364231] mean value: 0.013903927803039551 key: test_mcc value: [0.83797038 0.82512315 0.82880708 0.9321832 0.71611487 0.89342711 0.79385662 0.85933785 0.71428571 0.96490128] mean value: 0.8366007266767884 key: train_mcc value: [0.90933143 0.9172256 0.89754406 0.90144111 0.91732994 0.90945587 0.90158179 0.91738682 0.90951226 0.89766562] mean value: 0.9078474512102387 key: test_accuracy value: [0.9122807 0.9122807 0.9122807 0.96491228 0.85714286 0.94642857 0.89285714 0.92857143 0.85714286 0.98214286] mean value: 0.9166040100250626 key: train_accuracy value: [0.95463511 0.95857988 0.94871795 0.95069034 0.95866142 0.95472441 0.9507874 0.95866142 0.95472441 0.9488189 ] mean value: 0.953900122691764 key: test_fscore value: [0.91803279 0.9122807 0.91803279 0.96428571 0.86206897 0.94545455 0.9 0.93103448 0.85714286 0.98245614] mean value: 0.9190788981034733 key: train_fscore value: [0.95499022 0.95841584 0.94820717 0.95029821 0.95857988 0.95463511 0.95069034 0.95841584 0.95499022 0.9486166 ] mean value: 0.9537839421981321 key: test_precision value: [0.84848485 0.89655172 0.875 1. 0.83333333 0.96296296 0.84375 0.9 0.85714286 0.96551724] mean value: 0.8982742967441243 key: train_precision value: [0.94941634 0.96414343 0.95582329 0.956 0.96047431 0.95652174 0.95256917 0.96414343 0.94941634 0.95238095] mean value: 0.9560889000359492 key: test_recall value: [1. 0.92857143 0.96551724 0.93103448 0.89285714 0.92857143 0.96428571 0.96428571 0.85714286 1. ] mean value: 0.9432266009852217 key: train_recall value: [0.96062992 0.95275591 0.94071146 0.94466403 0.95669291 0.95275591 0.9488189 0.95275591 0.96062992 0.94488189] mean value: 0.9515296753913666 key: test_roc_auc value: [0.9137931 0.91256158 0.91133005 0.96551724 0.85714286 0.94642857 0.89285714 0.92857143 0.85714286 0.98214286] mean value: 0.9167487684729064 key: train_roc_auc value: [0.95462326 0.95859139 0.94870219 0.95067847 0.95866142 0.95472441 0.9507874 0.95866142 0.95472441 0.9488189 ] mean value: 0.9538973265693567 key: test_jcc value: [0.84848485 0.83870968 0.84848485 0.93103448 0.75757576 0.89655172 0.81818182 0.87096774 0.75 0.96551724] mean value: 0.8525508140357974 key: train_jcc value: [0.91385768 0.92015209 0.90151515 0.90530303 0.92045455 0.91320755 0.90601504 0.92015209 0.91385768 0.90225564] mean value: 0.9116770489449016 MCC on Blind test: 0.69 Accuracy on Blind test: 0.88 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [1.06151056 0.97917652 1.15594339 1.03205562 1.13682246 1.09652424 1.14049935 0.84171343 1.05300856 0.92840409] mean value: 1.0425658226013184 key: score_time value: [0.01576757 0.01351643 0.01401377 0.01967168 0.01395941 0.02089858 0.01919866 0.01392365 0.01353335 0.01369238] mean value: 0.015817546844482423 key: test_mcc value: [0.89988258 0.82512315 0.93202124 0.93202124 0.82618439 0.93094934 0.89802651 0.89802651 0.89342711 0.96490128] mean value: 0.900056335305218 key: train_mcc value: [0.99211042 0.98028384 0.98817342 0.99211042 1. 0.98819663 1. 1. 0.99212598 0.99212598] mean value: 0.9925126702482893 key: test_accuracy value: [0.94736842 0.9122807 0.96491228 0.96491228 0.91071429 0.96428571 0.94642857 0.94642857 0.94642857 0.98214286] mean value: 0.9485902255639097 key: train_accuracy value: [0.99605523 0.99013807 0.99408284 0.99605523 1. 0.99409449 1. 1. 0.99606299 0.99606299] mean value: 0.9962551833387691 key: test_fscore value: [0.94915254 0.9122807 0.96666667 0.96666667 0.91525424 0.96296296 0.94915254 0.94915254 0.94736842 0.98245614] mean value: 0.9501113423860971 key: train_fscore value: [0.99606299 0.99013807 0.99408284 0.99604743 1. 0.99410609 1. 1. 0.99606299 0.99606299] mean value: 0.9962563404879103 key: test_precision value: [0.90322581 0.89655172 0.93548387 0.93548387 0.87096774 1. 0.90322581 0.90322581 0.93103448 0.96551724] mean value: 0.9244716351501668 key: train_precision value: [0.99606299 0.99209486 0.99212598 0.99604743 1. 0.99215686 1. 1. 0.99606299 0.99606299] mean value: 0.9960614115865138 key: test_recall value: [1. 0.92857143 1. 1. 0.96428571 0.92857143 1. 1. 0.96428571 1. ] mean value: 0.9785714285714285 key: train_recall value: [0.99606299 0.98818898 0.99604743 0.99604743 1. 0.99606299 1. 1. 0.99606299 0.99606299] mean value: 0.9964535806541969 key: test_roc_auc value: [0.94827586 0.91256158 0.96428571 0.96428571 0.91071429 0.96428571 0.94642857 0.94642857 0.94642857 0.98214286] mean value: 0.9485837438423645 key: train_roc_auc value: [0.99605521 0.99014192 0.99408671 0.99605521 1. 0.99409449 1. 1. 0.99606299 0.99606299] mean value: 0.9962559521956988 key: test_jcc value: [0.90322581 0.83870968 0.93548387 0.93548387 0.84375 0.92857143 0.90322581 0.90322581 0.9 0.96551724] mean value: 0.9057193508660416 key: train_jcc value: [0.99215686 0.98046875 0.98823529 0.99212598 1. 0.98828125 1. 1. 0.99215686 0.99215686] mean value: 0.992558186660491 MCC on Blind test: 0.77 Accuracy on Blind test: 0.92 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01462245 0.01250768 0.01081085 0.01020527 0.01017833 0.01032519 0.01021409 0.01031613 0.01019001 0.01003003] mean value: 0.010940003395080566 key: score_time value: [0.01250744 0.00965738 0.00924444 0.00913548 0.00906038 0.00908661 0.00905704 0.00912213 0.00909281 0.00910807] mean value: 0.009507179260253906 key: test_mcc value: [0.59358067 0.6166424 0.75492611 0.58562417 0.53881591 0.71611487 0.65814518 0.5728919 0.72168784 0.78571429] mean value: 0.6544143324803186 key: train_mcc value: [0.73570695 0.73999638 0.7556462 0.70845665 0.69724436 0.72930229 0.77174925 0.73248786 0.78860037 0.67735436] mean value: 0.7336544661308108 key: test_accuracy value: [0.78947368 0.80701754 0.87719298 0.78947368 0.76785714 0.85714286 0.82142857 0.78571429 0.85714286 0.89285714] mean value: 0.82453007518797 key: train_accuracy value: [0.8678501 0.86982249 0.87771203 0.85404339 0.84448819 0.86417323 0.88582677 0.86614173 0.89370079 0.83858268] mean value: 0.8662341393716319 key: test_fscore value: [0.80645161 0.79245283 0.87719298 0.77777778 0.75471698 0.86206897 0.83870968 0.77777778 0.84615385 0.89285714] mean value: 0.8226159594183262 key: train_fscore value: [0.8678501 0.87209302 0.87890625 0.85603113 0.8315565 0.86756238 0.88671875 0.86770428 0.89655172 0.84046693] mean value: 0.8665441063880106 key: test_precision value: [0.73529412 0.84 0.89285714 0.84 0.8 0.83333333 0.76470588 0.80769231 0.91666667 0.89285714] mean value: 0.8323406593406594 key: train_precision value: [0.86956522 0.85877863 0.86872587 0.84291188 0.90697674 0.84644195 0.87984496 0.85769231 0.87313433 0.83076923] mean value: 0.8634841109277654 key: test_recall value: [0.89285714 0.75 0.86206897 0.72413793 0.71428571 0.89285714 0.92857143 0.75 0.78571429 0.89285714] mean value: 0.8193349753694581 key: train_recall value: [0.86614173 0.88582677 0.88932806 0.86956522 0.76771654 0.88976378 0.89370079 0.87795276 0.92125984 0.8503937 ] mean value: 0.8711649186144222 key: test_roc_auc value: [0.79125616 0.80603448 0.87746305 0.79064039 0.76785714 0.85714286 0.82142857 0.78571429 0.85714286 0.89285714] mean value: 0.8247536945812808 key: train_roc_auc value: [0.86785347 0.86979086 0.8777349 0.85407395 0.84448819 0.86417323 0.88582677 0.86614173 0.89370079 0.83858268] mean value: 0.8662366561887274 key: test_jcc value: [0.67567568 0.65625 0.78125 0.63636364 0.60606061 0.75757576 0.72222222 0.63636364 0.73333333 0.80645161] mean value: 0.7011546480498093 key: train_jcc value: [0.76655052 0.77319588 0.78397213 0.74829932 0.71167883 0.76610169 0.79649123 0.76632302 0.8125 0.72483221] mean value: 0.7649944838022477 MCC on Blind test: 0.43 Accuracy on Blind test: 0.78 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01057291 0.01041102 0.01036787 0.01058412 0.01035976 0.01047611 0.01075315 0.01133537 0.01048946 0.01053786] mean value: 0.010588765144348145 key: score_time value: [0.00910211 0.00900173 0.00892186 0.0090301 0.00900674 0.00920486 0.00922561 0.00952029 0.00915504 0.00927591] mean value: 0.009144425392150879 key: test_mcc value: [0.80817326 0.57973205 0.43842365 0.43842365 0.57735027 0.64285714 0.58501794 0.64285714 0.47187011 0.53605627] mean value: 0.5720761462070288 key: train_mcc value: [0.59369456 0.64499463 0.64499463 0.63709364 0.62999938 0.56756289 0.64173726 0.60292787 0.63779528 0.59849942] mean value: 0.6199299551759412 key: test_accuracy value: [0.89473684 0.78947368 0.71929825 0.71929825 0.78571429 0.82142857 0.78571429 0.82142857 0.73214286 0.76785714] mean value: 0.7837092731829574 key: train_accuracy value: [0.79684418 0.82248521 0.82248521 0.81854043 0.81496063 0.78346457 0.82086614 0.8011811 0.81889764 0.7992126 ] mean value: 0.8098937706751153 key: test_fscore value: [0.90322581 0.77777778 0.72413793 0.72413793 0.76923077 0.82142857 0.80645161 0.82142857 0.70588235 0.76363636] mean value: 0.7817337687867034 key: train_fscore value: [0.79684418 0.82213439 0.82283465 0.81746032 0.81640625 0.77822581 0.82121807 0.79678068 0.81889764 0.79761905] mean value: 0.8088421032567705 key: test_precision value: [0.82352941 0.80769231 0.72413793 0.72413793 0.83333333 0.82142857 0.73529412 0.82142857 0.7826087 0.77777778] mean value: 0.7851368648793466 key: train_precision value: [0.79841897 0.82539683 0.81960784 0.82071713 0.81007752 0.79752066 0.81960784 0.81481481 0.81889764 0.804 ] mean value: 0.8129059248624415 key: test_recall value: [1. 0.75 0.72413793 0.72413793 0.71428571 0.82142857 0.89285714 0.82142857 0.64285714 0.75 ] mean value: 0.7841133004926109 key: train_recall value: [0.79527559 0.81889764 0.82608696 0.81422925 0.82283465 0.75984252 0.82283465 0.77952756 0.81889764 0.79133858] mean value: 0.8049765024431235 key: test_roc_auc value: [0.89655172 0.7887931 0.71921182 0.71921182 0.78571429 0.82142857 0.78571429 0.82142857 0.73214286 0.76785714] mean value: 0.7838054187192118 key: train_roc_auc value: [0.79684728 0.8224923 0.8224923 0.81853195 0.81496063 0.78346457 0.82086614 0.8011811 0.81889764 0.7992126 ] mean value: 0.8098946500264542 key: test_jcc value: [0.82352941 0.63636364 0.56756757 0.56756757 0.625 0.6969697 0.67567568 0.6969697 0.54545455 0.61764706] mean value: 0.6452744857156621 key: train_jcc value: [0.66229508 0.69798658 0.69899666 0.69127517 0.68976898 0.6369637 0.69666667 0.66220736 0.69333333 0.66336634] mean value: 0.6792859850212573 MCC on Blind test: 0.2 Accuracy on Blind test: 0.69 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01003718 0.01101255 0.01126194 0.01116419 0.01099014 0.01097584 0.01107812 0.01125479 0.01113129 0.01112223] mean value: 0.011002826690673827 key: score_time value: [0.01278234 0.0125072 0.01366591 0.01346803 0.01265216 0.01326346 0.0132699 0.01311994 0.01899934 0.01349974] mean value: 0.013722801208496093 key: test_mcc value: [0.62036458 0.54377353 0.45409716 0.66268617 0.57735027 0.68965631 0.53881591 0.68250015 0.46697379 0.60753044] mean value: 0.5843748306991364 key: train_mcc value: [0.76941166 0.76071428 0.79980738 0.76082422 0.79356189 0.79286644 0.77572829 0.76054069 0.74294954 0.78364389] mean value: 0.7740048286091203 key: test_accuracy value: [0.78947368 0.77192982 0.71929825 0.8245614 0.78571429 0.83928571 0.76785714 0.83928571 0.73214286 0.80357143] mean value: 0.787312030075188 key: train_accuracy value: [0.8816568 0.87771203 0.8974359 0.87771203 0.89566929 0.89173228 0.88582677 0.87795276 0.87007874 0.88976378] mean value: 0.884554038733324 key: test_fscore value: [0.81818182 0.76363636 0.75757576 0.84375 0.8 0.85245902 0.77966102 0.84745763 0.71698113 0.8 ] mean value: 0.797970273193065 key: train_fscore value: [0.88888889 0.88475836 0.90262172 0.88432836 0.89943074 0.89945155 0.89138577 0.88432836 0.8754717 0.89513109] mean value: 0.8905796538479781 key: test_precision value: [0.71052632 0.77777778 0.67567568 0.77142857 0.75 0.78787879 0.74193548 0.80645161 0.76 0.81481481] mean value: 0.7596489040139295 key: train_precision value: [0.83916084 0.83802817 0.85765125 0.83745583 0.86813187 0.83959044 0.85 0.84042553 0.84057971 0.85357143] mean value: 0.8464595066564342 key: test_recall value: [0.96428571 0.75 0.86206897 0.93103448 0.85714286 0.92857143 0.82142857 0.89285714 0.67857143 0.78571429] mean value: 0.847167487684729 key: train_recall value: [0.94488189 0.93700787 0.95256917 0.93675889 0.93307087 0.96850394 0.93700787 0.93307087 0.91338583 0.94094488] mean value: 0.9397202078989139 key: test_roc_auc value: [0.79248768 0.77155172 0.71674877 0.8226601 0.78571429 0.83928571 0.76785714 0.83928571 0.73214286 0.80357143] mean value: 0.7871305418719212 key: train_roc_auc value: [0.88153185 0.87759485 0.89754443 0.87782827 0.89566929 0.89173228 0.88582677 0.87795276 0.87007874 0.88976378] mean value: 0.8845523015156702 key: test_jcc value: [0.69230769 0.61764706 0.6097561 0.72972973 0.66666667 0.74285714 0.63888889 0.73529412 0.55882353 0.66666667] mean value: 0.6658637590560116 key: train_jcc value: [0.8 0.79333333 0.8225256 0.79264214 0.81724138 0.81727575 0.80405405 0.79264214 0.77852349 0.81016949] mean value: 0.8028407373870428 MCC on Blind test: 0.24 Accuracy on Blind test: 0.7 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02933192 0.02366972 0.0223372 0.02245879 0.02203465 0.02228165 0.02257109 0.02298236 0.02250648 0.02265596] mean value: 0.023282980918884276 key: score_time value: [0.01401258 0.01203036 0.01217723 0.01243663 0.01200104 0.01210237 0.01222324 0.0121057 0.01228499 0.01206851] mean value: 0.012344264984130859 key: test_mcc value: [0.75047877 0.7589669 0.7257422 0.96551724 0.71428571 0.89342711 0.73127242 0.68250015 0.64450339 0.93094934] mean value: 0.7797643240554251 key: train_mcc value: [0.83222561 0.85928385 0.83474492 0.8364528 0.84004879 0.88213591 0.84756752 0.84293789 0.84004879 0.85869374] mean value: 0.8474139833983546 key: test_accuracy value: [0.85964912 0.87719298 0.85964912 0.98245614 0.85714286 0.94642857 0.85714286 0.83928571 0.82142857 0.96428571] mean value: 0.8864661654135338 key: train_accuracy value: [0.91518738 0.92899408 0.91715976 0.91715976 0.91929134 0.94094488 0.92322835 0.92125984 0.91929134 0.92913386] mean value: 0.9231650592492506 key: test_fscore value: [0.875 0.88135593 0.87096774 0.98245614 0.85714286 0.94545455 0.87096774 0.84745763 0.81481481 0.96551724] mean value: 0.8911134642335407 key: train_fscore value: [0.91809524 0.93103448 0.91828794 0.91984733 0.92160612 0.94163424 0.92514395 0.92248062 0.92160612 0.93023256] mean value: 0.9249968597409465 key: test_precision value: [0.77777778 0.83870968 0.81818182 1. 0.85714286 0.96296296 0.79411765 0.80645161 0.84615385 0.93333333] mean value: 0.8634831532934 key: train_precision value: [0.88929889 0.90671642 0.90421456 0.88929889 0.89591078 0.93076923 0.90262172 0.90839695 0.89591078 0.91603053] mean value: 0.9039168759145274 key: test_recall value: [1. 0.92857143 0.93103448 0.96551724 0.85714286 0.92857143 0.96428571 0.89285714 0.78571429 1. ] mean value: 0.9253694581280788 key: train_recall value: [0.9488189 0.95669291 0.93280632 0.95256917 0.9488189 0.95275591 0.9488189 0.93700787 0.9488189 0.94488189] mean value: 0.9471989667299493 key: test_roc_auc value: [0.86206897 0.87807882 0.85837438 0.98275862 0.85714286 0.94642857 0.85714286 0.83928571 0.82142857 0.96428571] mean value: 0.8866995073891626 key: train_roc_auc value: [0.91512091 0.92893934 0.91719056 0.91722947 0.91929134 0.94094488 0.92322835 0.92125984 0.91929134 0.92913386] mean value: 0.9231629890137251 key: test_jcc value: [0.77777778 0.78787879 0.77142857 0.96551724 0.75 0.89655172 0.77142857 0.73529412 0.6875 0.93333333] mean value: 0.8076710125011343 key: train_jcc value: [0.84859155 0.87096774 0.84892086 0.85159011 0.85460993 0.88970588 0.86071429 0.85611511 0.85460993 0.86956522] mean value: 0.8605390612075907 MCC on Blind test: 0.66 Accuracy on Blind test: 0.88 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [2.33146954 2.55629253 2.25351214 2.16796517 2.04874682 2.0742538 2.40498137 2.58180976 2.5396452 2.15567756] mean value: 2.3114353895187376 key: score_time value: [0.01615882 0.01403785 0.01418424 0.026016 0.01434445 0.01499295 0.03440094 0.01265144 0.01818609 0.02401257] mean value: 0.018898534774780273 key: test_mcc value: [0.9321832 0.89988258 0.93202124 0.96547546 0.85933785 0.89342711 0.89802651 0.89802651 0.96490128 0.96490128] mean value: 0.9208183019462288 key: train_mcc value: [0.99606293 0.99606293 0.99606299 0.99606299 1. 0.99607071 0.99212598 1. 0.99212598 0.99607071] mean value: 0.9960645238155992 key: test_accuracy value: [0.96491228 0.94736842 0.96491228 0.98245614 0.92857143 0.94642857 0.94642857 0.94642857 0.98214286 0.98214286] mean value: 0.9591791979949874 key: train_accuracy value: [0.99802761 0.99802761 0.99802761 0.99802761 1. 0.9980315 0.99606299 1. 0.99606299 0.9980315 ] mean value: 0.9980299430026868 key: test_fscore value: [0.96551724 0.94915254 0.96666667 0.98305085 0.93103448 0.94545455 0.94915254 0.94915254 0.98181818 0.98245614] mean value: 0.9603455733004473 key: train_fscore value: [0.99803536 0.99803536 0.99802761 0.99802761 1. 0.99803536 0.99606299 1. 0.99606299 0.99803536] mean value: 0.9980322664907467 key: test_precision value: [0.93333333 0.90322581 0.93548387 0.96666667 0.9 0.96296296 0.90322581 0.90322581 1. 0.96551724] mean value: 0.9373641494664854 key: train_precision value: [0.99607843 0.99607843 0.99606299 0.99606299 1. 0.99607843 0.99606299 1. 0.99606299 0.99607843] mean value: 0.9968565693994134 key: test_recall value: [1. 1. 1. 1. 0.96428571 0.92857143 1. 1. 0.96428571 1. ] mean value: 0.9857142857142858 key: train_recall value: [1. 1. 1. 1. 1. 1. 0.99606299 1. 0.99606299 1. ] mean value: 0.9992125984251968 key: test_roc_auc value: [0.96551724 0.94827586 0.96428571 0.98214286 0.92857143 0.94642857 0.94642857 0.94642857 0.98214286 0.98214286] mean value: 0.9592364532019705 key: train_roc_auc value: [0.99802372 0.99802372 0.9980315 0.9980315 1. 0.9980315 0.99606299 1. 0.99606299 0.9980315 ] mean value: 0.9980299399333976 key: test_jcc value: [0.93333333 0.90322581 0.93548387 0.96666667 0.87096774 0.89655172 0.90322581 0.90322581 0.96428571 0.96551724] mean value: 0.924248371206102 key: train_jcc value: [0.99607843 0.99607843 0.99606299 0.99606299 1. 0.99607843 0.99215686 1. 0.99215686 0.99607843] mean value: 0.9960753435232361 MCC on Blind test: 0.72 Accuracy on Blind test: 0.9 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.03225613 0.02523661 0.02279449 0.02276397 0.02029443 0.02316093 0.02214956 0.02145815 0.02218223 0.02365208] mean value: 0.02359485626220703 key: score_time value: [0.01220226 0.00912762 0.00895953 0.00886941 0.00896859 0.00906873 0.00915599 0.00904799 0.00930071 0.00928521] mean value: 0.009398603439331054 key: test_mcc value: [1. 0.9321832 1. 1. 0.73127242 1. 0.89802651 0.93094934 0.92857143 1. ] mean value: 0.9421002898482495 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.96491228 1. 1. 0.85714286 1. 0.94642857 0.96428571 0.96428571 1. ] mean value: 0.9697055137844611 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.96551724 1. 1. 0.87096774 1. 0.94915254 0.96551724 0.96428571 1. ] mean value: 0.9715440481352701 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.93333333 1. 1. 0.79411765 1. 0.90322581 0.93333333 0.96428571 1. ] mean value: 0.9528295834462818 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.96428571 1. 1. 1. 0.96428571 1. ] mean value: 0.9928571428571429 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.96551724 1. 1. 0.85714286 1. 0.94642857 0.96428571 0.96428571 1. ] mean value: 0.9697660098522167 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.93333333 1. 1. 0.77142857 1. 0.90322581 0.93333333 0.93103448 1. ] mean value: 0.9472355527305472 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.87 Accuracy on Blind test: 0.96 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.12735271 0.1331346 0.13445926 0.12076545 0.1190021 0.1195786 0.11903858 0.12229443 0.12137938 0.11910081] mean value: 0.12361059188842774 key: score_time value: [0.0200398 0.02035165 0.02030778 0.01841545 0.01809835 0.01818275 0.01851726 0.01960111 0.01816249 0.01938963] mean value: 0.019106626510620117 key: test_mcc value: [0.96551724 0.96551724 0.96547546 1. 0.89342711 0.89342711 0.93094934 0.96490128 0.96490128 1. ] mean value: 0.9544116062967756 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 0.98245614 0.98245614 1. 0.94642857 0.94642857 0.96428571 0.98214286 0.98214286 1. ] mean value: 0.9768796992481202 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98245614 0.98245614 0.98305085 1. 0.94736842 0.94545455 0.96551724 0.98245614 0.98181818 1. ] mean value: 0.9770577658214927 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96551724 0.96551724 0.96666667 1. 0.93103448 0.96296296 0.93333333 0.96551724 1. 1. ] mean value: 0.9690549169859515 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.96428571 0.92857143 1. 1. 0.96428571 1. ] mean value: 0.9857142857142858 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98275862 0.98275862 0.98214286 1. 0.94642857 0.94642857 0.96428571 0.98214286 0.98214286 1. ] mean value: 0.976908866995074 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96551724 0.96551724 0.96666667 1. 0.9 0.89655172 0.93333333 0.96551724 0.96428571 1. ] mean value: 0.9557389162561577 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.6 Accuracy on Blind test: 0.88 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01039386 0.01049614 0.01059151 0.01052785 0.01056314 0.01058316 0.0104866 0.01042509 0.01065969 0.01042533] mean value: 0.010515236854553222 key: score_time value: [0.00886655 0.00900602 0.00947428 0.00894213 0.00898314 0.00901628 0.00893044 0.00892806 0.0092063 0.0089128 ] mean value: 0.009026598930358887 key: test_mcc value: [0.77903565 0.86189955 0.74822828 0.74822828 0.79385662 0.78772636 0.6882472 0.89802651 0.82618439 0.8660254 ] mean value: 0.7997458248025408 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.87719298 0.92982456 0.85964912 0.85964912 0.89285714 0.89285714 0.82142857 0.94642857 0.91071429 0.92857143] mean value: 0.8919172932330827 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.88888889 0.93103448 0.87878788 0.87878788 0.9 0.89655172 0.84848485 0.94915254 0.91525424 0.93333333] mean value: 0.9020275814840397 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8 0.9 0.78378378 0.78378378 0.84375 0.86666667 0.73684211 0.90322581 0.87096774 0.875 ] mean value: 0.8364019887884488 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96428571 1. 1. 0.96428571 0.92857143 1. 1. 0.96428571 1. ] mean value: 0.9821428571428572 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.87931034 0.93041872 0.85714286 0.85714286 0.89285714 0.89285714 0.82142857 0.94642857 0.91071429 0.92857143] mean value: 0.8916871921182267 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.8 0.87096774 0.78378378 0.78378378 0.81818182 0.8125 0.73684211 0.90322581 0.84375 0.875 ] mean value: 0.822803503939964 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.53 Accuracy on Blind test: 0.86 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.72128773 1.77692485 1.74783516 1.83495116 1.73265433 1.9805038 1.75580668 1.82116747 1.85459757 1.74639463] mean value: 1.797212338447571 key: score_time value: [0.10111117 0.09629679 0.09658933 0.11028838 0.09789872 0.10091949 0.10473514 0.12026238 0.09599161 0.09527445] mean value: 0.10193674564361573 key: test_mcc value: [1. 0.96551724 1. 1. 0.89342711 0.93094934 0.93094934 0.93094934 0.92857143 1. ] mean value: 0.9580363791069356 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.98245614 1. 1. 0.94642857 0.96428571 0.96428571 0.96428571 0.96428571 1. ] mean value: 0.9786027568922305 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.98245614 1. 1. 0.94736842 0.96296296 0.96551724 0.96551724 0.96428571 1. ] mean value: 0.9788107721410807 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96551724 1. 1. 0.93103448 1. 0.93333333 0.93333333 0.96428571 1. ] mean value: 0.9727504105090312 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.96428571 0.92857143 1. 1. 0.96428571 1. ] mean value: 0.9857142857142858 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.98275862 1. 1. 0.94642857 0.96428571 0.96428571 0.96428571 0.96428571 1. ] mean value: 0.9786330049261084 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.96551724 1. 1. 0.9 0.92857143 0.93333333 0.93333333 0.93103448 1. ] mean value: 0.9591789819376026 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.79 Accuracy on Blind test: 0.93 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.93988156 0.98185563 0.95095325 0.95623612 1.0021174 1.01571774 0.96705437 0.9789834 1.06533718 0.96144629] mean value: 0.9819582939147949 key: score_time value: [0.17660475 0.25053692 0.26670814 0.16711521 0.25986362 0.19897556 0.23333478 0.21554923 0.25029039 0.24634266] mean value: 0.22653212547302246 key: test_mcc value: [0.96551724 0.8951918 1. 1. 0.93094934 0.93094934 0.93094934 0.93094934 1. 1. ] mean value: 0.958450638898162 key: train_mcc value: [0.97660378 0.9685613 0.98046755 0.97275888 0.98437404 0.98050495 0.98437404 0.98437404 0.98050495 0.98050495] mean value: 0.9793028481235929 key: test_accuracy value: [0.98245614 0.94736842 1. 1. 0.96428571 0.96428571 0.96428571 0.96428571 1. 1. ] mean value: 0.9786967418546366 key: train_accuracy value: [0.98816568 0.98422091 0.99013807 0.98619329 0.99212598 0.99015748 0.99212598 0.99212598 0.99015748 0.99015748] mean value: 0.9895568342418737 key: test_fscore value: [0.98245614 0.94545455 1. 1. 0.96551724 0.96296296 0.96551724 0.96551724 1. 1. ] mean value: 0.9787425372906317 key: train_fscore value: [0.98832685 0.984375 0.99021526 0.98635478 0.9921875 0.99025341 0.9921875 0.9921875 0.99025341 0.99025341] mean value: 0.9896594622183483 key: test_precision value: [0.96551724 0.96296296 1. 1. 0.93333333 1. 0.93333333 0.93333333 1. 1. ] mean value: 0.9728480204342274 key: train_precision value: [0.97692308 0.97674419 0.98062016 0.97307692 0.98449612 0.98069498 0.98449612 0.98449612 0.98069498 0.98069498] mean value: 0.9802937655263236 key: test_recall value: [1. 0.92857143 1. 1. 1. 0.92857143 1. 1. 1. 1. ] mean value: 0.9857142857142858 key: train_recall value: [1. 0.99212598 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9992125984251968 key: test_roc_auc value: [0.98275862 0.94704433 1. 1. 0.96428571 0.96428571 0.96428571 0.96428571 1. 1. ] mean value: 0.9786945812807882 key: train_roc_auc value: [0.98814229 0.98420528 0.99015748 0.98622047 0.99212598 0.99015748 0.99212598 0.99212598 0.99015748 0.99015748] mean value: 0.9895575923562915 key: test_jcc value: [0.96551724 0.89655172 1. 1. 0.93333333 0.92857143 0.93333333 0.93333333 1. 1. ] mean value: 0.959064039408867 key: train_jcc value: [0.97692308 0.96923077 0.98062016 0.97307692 0.98449612 0.98069498 0.98449612 0.98449612 0.98069498 0.98069498] mean value: 0.9795424238447494 MCC on Blind test: 0.83 Accuracy on Blind test: 0.94 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02351713 0.01024127 0.01087022 0.01184297 0.01054907 0.01104808 0.01173329 0.01041317 0.0110364 0.01116943] mean value: 0.01224210262298584 key: score_time value: [0.0089438 0.00970793 0.00972652 0.00989699 0.00950623 0.00932384 0.0098536 0.00955272 0.00994015 0.00911736] mean value: 0.009556913375854492 key: test_mcc value: [0.80817326 0.57973205 0.43842365 0.43842365 0.57735027 0.64285714 0.58501794 0.64285714 0.47187011 0.53605627] mean value: 0.5720761462070288 key: train_mcc value: [0.59369456 0.64499463 0.64499463 0.63709364 0.62999938 0.56756289 0.64173726 0.60292787 0.63779528 0.59849942] mean value: 0.6199299551759412 key: test_accuracy value: [0.89473684 0.78947368 0.71929825 0.71929825 0.78571429 0.82142857 0.78571429 0.82142857 0.73214286 0.76785714] mean value: 0.7837092731829574 key: train_accuracy value: [0.79684418 0.82248521 0.82248521 0.81854043 0.81496063 0.78346457 0.82086614 0.8011811 0.81889764 0.7992126 ] mean value: 0.8098937706751153 key: test_fscore value: [0.90322581 0.77777778 0.72413793 0.72413793 0.76923077 0.82142857 0.80645161 0.82142857 0.70588235 0.76363636] mean value: 0.7817337687867034 key: train_fscore value: [0.79684418 0.82213439 0.82283465 0.81746032 0.81640625 0.77822581 0.82121807 0.79678068 0.81889764 0.79761905] mean value: 0.8088421032567705 key: test_precision value: [0.82352941 0.80769231 0.72413793 0.72413793 0.83333333 0.82142857 0.73529412 0.82142857 0.7826087 0.77777778] mean value: 0.7851368648793466 key: train_precision value: [0.79841897 0.82539683 0.81960784 0.82071713 0.81007752 0.79752066 0.81960784 0.81481481 0.81889764 0.804 ] mean value: 0.8129059248624415 key: test_recall value: [1. 0.75 0.72413793 0.72413793 0.71428571 0.82142857 0.89285714 0.82142857 0.64285714 0.75 ] mean value: 0.7841133004926109 key: train_recall value: [0.79527559 0.81889764 0.82608696 0.81422925 0.82283465 0.75984252 0.82283465 0.77952756 0.81889764 0.79133858] mean value: 0.8049765024431235 key: test_roc_auc value: [0.89655172 0.7887931 0.71921182 0.71921182 0.78571429 0.82142857 0.78571429 0.82142857 0.73214286 0.76785714] mean value: 0.7838054187192118 key: train_roc_auc value: [0.79684728 0.8224923 0.8224923 0.81853195 0.81496063 0.78346457 0.82086614 0.8011811 0.81889764 0.7992126 ] mean value: 0.8098946500264542 key: test_jcc value: [0.82352941 0.63636364 0.56756757 0.56756757 0.625 0.6969697 0.67567568 0.6969697 0.54545455 0.61764706] mean value: 0.6452744857156621 key: train_jcc value: [0.66229508 0.69798658 0.69899666 0.69127517 0.68976898 0.6369637 0.69666667 0.66220736 0.69333333 0.66336634] mean value: 0.6792859850212573 MCC on Blind test: 0.2 Accuracy on Blind test: 0.69 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.08453417 0.06654429 0.07418919 0.08602238 0.06946802 0.08242369 0.06857514 0.07357121 0.07413006 0.07933426] mean value: 0.07587924003601074 key: score_time value: [0.01240754 0.01080799 0.01115561 0.01121211 0.01110983 0.01113629 0.01076126 0.01157236 0.01180458 0.0112443 ] mean value: 0.011321187019348145 key: test_mcc value: [1. 0.9321832 1. 0.96547546 0.89802651 1. 0.96490128 0.93094934 0.96490128 1. ] mean value: 0.965643706501215 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.96491228 1. 0.98245614 0.94642857 1. 0.98214286 0.96428571 0.98214286 1. ] mean value: 0.9822368421052632 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.96551724 1. 0.98305085 0.94915254 1. 0.98245614 0.96551724 0.98245614 1. ] mean value: 0.9828150153290883 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.93333333 1. 0.96666667 0.90322581 1. 0.96551724 0.93333333 0.96551724 1. ] mean value: 0.9667593622543567 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.96551724 1. 0.98214286 0.94642857 1. 0.98214286 0.96428571 0.98214286 1. ] mean value: 0.9822660098522168 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.93333333 1. 0.96666667 0.90322581 1. 0.96551724 0.93333333 0.96551724 1. ] mean value: 0.9667593622543567 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.88 Accuracy on Blind test: 0.96 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.05864573 0.08753681 0.07722974 0.0898056 0.0872829 0.07127666 0.08494568 0.06207895 0.05319452 0.06795883] mean value: 0.07399554252624511 key: score_time value: [0.01858926 0.01282024 0.02478695 0.01922059 0.01901817 0.01264048 0.01914215 0.01234865 0.01900911 0.01905823] mean value: 0.01766338348388672 key: test_mcc value: [0.86851042 0.82512315 0.8953202 0.86789789 0.82195294 0.96490128 0.82618439 0.82618439 0.96490128 0.96490128] mean value: 0.8825877226392335 key: train_mcc value: [0.96055211 0.97239383 0.95266254 0.95661511 0.96850394 0.95670033 0.96850394 0.97250878 0.96062992 0.96062992] mean value: 0.9629700415209543 key: test_accuracy value: [0.92982456 0.9122807 0.94736842 0.92982456 0.91071429 0.98214286 0.91071429 0.91071429 0.98214286 0.98214286] mean value: 0.9397869674185464 key: train_accuracy value: [0.98027613 0.98619329 0.97633136 0.97830375 0.98425197 0.97834646 0.98425197 0.98622047 0.98031496 0.98031496] mean value: 0.9814805323890727 key: test_fscore value: [0.93333333 0.9122807 0.94736842 0.93548387 0.90909091 0.98245614 0.91525424 0.91525424 0.98245614 0.98245614] mean value: 0.9415434131827904 key: train_fscore value: [0.98031496 0.98624754 0.97628458 0.97830375 0.98425197 0.978389 0.98425197 0.98613861 0.98031496 0.98031496] mean value: 0.9814812307513464 key: test_precision value: [0.875 0.89655172 0.96428571 0.87878788 0.92592593 0.96551724 0.87096774 0.87096774 0.96551724 0.96551724] mean value: 0.9179038451146349 key: train_precision value: [0.98031496 0.98431373 0.97628458 0.97637795 0.98425197 0.97647059 0.98425197 0.99203187 0.98031496 0.98031496] mean value: 0.9814927542869231 key: test_recall value: [1. 0.92857143 0.93103448 1. 0.89285714 1. 0.96428571 0.96428571 1. 1. ] mean value: 0.968103448275862 key: train_recall value: [0.98031496 0.98818898 0.97628458 0.98023715 0.98425197 0.98031496 0.98425197 0.98031496 0.98031496 0.98031496] mean value: 0.9814789455665868 key: test_roc_auc value: [0.93103448 0.91256158 0.9476601 0.92857143 0.91071429 0.98214286 0.91071429 0.91071429 0.98214286 0.98214286] mean value: 0.9398399014778326 key: train_roc_auc value: [0.98027606 0.98618935 0.97633127 0.97830755 0.98425197 0.97834646 0.98425197 0.98622047 0.98031496 0.98031496] mean value: 0.9814805016961813 key: test_jcc value: [0.875 0.83870968 0.9 0.87878788 0.83333333 0.96551724 0.84375 0.84375 0.96551724 0.96551724] mean value: 0.8909882613678498 key: train_jcc value: [0.96138996 0.97286822 0.95366795 0.95752896 0.96899225 0.95769231 0.96899225 0.97265625 0.96138996 0.96138996] mean value: 0.9636568066237398 MCC on Blind test: 0.61 Accuracy on Blind test: 0.86 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01440048 0.01289797 0.00995111 0.0097754 0.00982618 0.01017928 0.00993538 0.00999284 0.01034856 0.00996208] mean value: 0.0107269287109375 key: score_time value: [0.01222897 0.00954247 0.00876284 0.0087254 0.00896811 0.00879383 0.00874734 0.00874972 0.00899076 0.00880122] mean value: 0.009231066703796387 key: test_mcc value: [0.64889453 0.64901478 0.6166424 0.58076493 0.50128041 0.61065803 0.52174919 0.50128041 0.53881591 0.67900461] mean value: 0.5848105187193532 key: train_mcc value: [0.61406315 0.6462136 0.65745192 0.5827872 0.67819632 0.57508846 0.6265721 0.67097829 0.68811802 0.54745203] mean value: 0.6286921092607803 key: test_accuracy value: [0.80701754 0.8245614 0.80701754 0.78947368 0.75 0.80357143 0.75 0.75 0.76785714 0.83928571] mean value: 0.7888784461152882 key: train_accuracy value: [0.80670611 0.82248521 0.82840237 0.79092702 0.83858268 0.78740157 0.81299213 0.83464567 0.84251969 0.77362205] mean value: 0.813828448958673 key: test_fscore value: [0.83076923 0.82142857 0.81967213 0.78571429 0.74074074 0.81355932 0.78125 0.75862069 0.75471698 0.84210526] mean value: 0.7948577215779411 key: train_fscore value: [0.81153846 0.82824427 0.83172147 0.79615385 0.84291188 0.79069767 0.81695568 0.84030418 0.84962406 0.77669903] mean value: 0.8184850560127853 key: test_precision value: [0.72972973 0.82142857 0.78125 0.81481481 0.76923077 0.77419355 0.69444444 0.73333333 0.8 0.82758621] mean value: 0.7746011418265312 key: train_precision value: [0.79323308 0.8037037 0.81439394 0.7752809 0.82089552 0.77862595 0.8 0.8125 0.81294964 0.76628352] mean value: 0.7977866266459333 key: test_recall value: [0.96428571 0.82142857 0.86206897 0.75862069 0.71428571 0.85714286 0.89285714 0.78571429 0.71428571 0.85714286] mean value: 0.8227832512315271 key: train_recall value: [0.83070866 0.85433071 0.84980237 0.81818182 0.86614173 0.80314961 0.83464567 0.87007874 0.88976378 0.78740157] mean value: 0.8404204662164265 key: test_roc_auc value: [0.80972906 0.82450739 0.80603448 0.79002463 0.75 0.80357143 0.75 0.75 0.76785714 0.83928571] mean value: 0.7891009852216748 key: train_roc_auc value: [0.80665868 0.82242227 0.82844449 0.79098067 0.83858268 0.78740157 0.81299213 0.83464567 0.84251969 0.77362205] mean value: 0.8138269895116865 key: test_jcc value: [0.71052632 0.6969697 0.69444444 0.64705882 0.58823529 0.68571429 0.64102564 0.61111111 0.60606061 0.72727273] mean value: 0.6608418946035045 key: train_jcc value: [0.6828479 0.70684039 0.71192053 0.66134185 0.72847682 0.65384615 0.69055375 0.72459016 0.73856209 0.63492063] mean value: 0.6933900281480951 MCC on Blind test: 0.59 Accuracy on Blind test: 0.83 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01517582 0.02547216 0.03090811 0.02699566 0.02723742 0.02942634 0.02590513 0.02685237 0.03164697 0.02517891] mean value: 0.026479887962341308 key: score_time value: [0.01010776 0.01122403 0.01200104 0.01206923 0.01203632 0.01194191 0.01193762 0.01199579 0.01200461 0.01850152] mean value: 0.012381982803344727 key: test_mcc value: [0.96551724 0.82512315 0.89988258 0.96551724 0.78772636 0.96490128 0.79385662 0.89802651 0.89342711 0.92857143] mean value: 0.8922549527400198 key: train_mcc value: [0.90714511 0.97239383 0.942062 0.94550473 0.95687833 0.93843444 0.95687833 0.98050495 0.97649905 0.96463421] mean value: 0.954093498873719 key: test_accuracy value: [0.98245614 0.9122807 0.94736842 0.98245614 0.89285714 0.98214286 0.89285714 0.94642857 0.94642857 0.96428571] mean value: 0.9449561403508772 key: train_accuracy value: [0.95266272 0.98619329 0.9704142 0.97238659 0.97834646 0.96850394 0.97834646 0.99015748 0.98818898 0.98228346] mean value: 0.9767483576387271 key: test_fscore value: [0.98245614 0.9122807 0.94545455 0.98245614 0.89655172 0.98245614 0.9 0.94915254 0.94736842 0.96428571] mean value: 0.9462462070110721 key: train_fscore value: [0.95121951 0.98624754 0.96957404 0.97177419 0.9785575 0.96934866 0.9785575 0.99025341 0.98828125 0.98217822] mean value: 0.9765991834337232 key: test_precision value: [0.96551724 0.89655172 1. 1. 0.86666667 0.96551724 0.84375 0.90322581 0.93103448 0.96428571] mean value: 0.9336548877059166 key: train_precision value: [0.98319328 0.98431373 0.99583333 0.99176955 0.96911197 0.94402985 0.96911197 0.98069498 0.98062016 0.98804781] mean value: 0.9786726616928444 key: test_recall value: [1. 0.92857143 0.89655172 0.96551724 0.92857143 1. 0.96428571 1. 0.96428571 0.96428571] mean value: 0.9612068965517242 key: train_recall value: [0.92125984 0.98818898 0.94466403 0.95256917 0.98818898 0.99606299 0.98818898 1. 0.99606299 0.97637795] mean value: 0.9751563910242446 key: test_roc_auc value: [0.98275862 0.91256158 0.94827586 0.98275862 0.89285714 0.98214286 0.89285714 0.94642857 0.94642857 0.96428571] mean value: 0.9451354679802956 key: train_roc_auc value: [0.95272478 0.98618935 0.97036351 0.97234758 0.97834646 0.96850394 0.97834646 0.99015748 0.98818898 0.98228346] mean value: 0.9767451993402011 key: test_jcc value: [0.96551724 0.83870968 0.89655172 0.96551724 0.8125 0.96551724 0.81818182 0.90322581 0.9 0.93103448] mean value: 0.8996755233087269 key: train_jcc value: [0.90697674 0.97286822 0.94094488 0.94509804 0.95801527 0.94052045 0.95801527 0.98069498 0.97683398 0.96498054] mean value: 0.9544948365069599 MCC on Blind test: 0.6 Accuracy on Blind test: 0.82 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02158213 0.01965857 0.02065611 0.01798463 0.01837373 0.01744699 0.01999712 0.0184269 0.02369738 0.02024031] mean value: 0.019806385040283203 key: score_time value: [0.01195168 0.01195455 0.01196837 0.01192522 0.01188898 0.011935 0.01193929 0.01191378 0.01197457 0.01197529] mean value: 0.011942672729492187 key: test_mcc value: [0.9321832 0.64058163 0.89952865 1. 0.75434227 0.89802651 0.74535599 0.8660254 0.75047877 0.93094934] mean value: 0.841747176363628 key: train_mcc value: [0.94524716 0.82552467 0.88136732 0.90342654 0.89677099 0.92228969 0.92779624 0.92985478 0.98038334 0.89990029] mean value: 0.9112561021351867 key: test_accuracy value: [0.96491228 0.78947368 0.94736842 1. 0.875 0.94642857 0.85714286 0.92857143 0.875 0.96428571] mean value: 0.9148182957393484 key: train_accuracy value: [0.97238659 0.90532544 0.93885602 0.95069034 0.94685039 0.96062992 0.96259843 0.96456693 0.99015748 0.9488189 ] mean value: 0.9540880429887092 key: test_fscore value: [0.96551724 0.82352941 0.95081967 1. 0.88135593 0.94339623 0.875 0.93333333 0.87272727 0.96551724] mean value: 0.9211196331333564 key: train_fscore value: [0.972 0.91366906 0.94139887 0.95219885 0.9489603 0.95967742 0.96394687 0.96525097 0.99021526 0.95057034] mean value: 0.9557887945831837 key: test_precision value: [0.93333333 0.7 0.90625 1. 0.83870968 1. 0.77777778 0.875 0.88888889 0.93333333] mean value: 0.8853293010752689 key: train_precision value: [0.98780488 0.8410596 0.90217391 0.92222222 0.91272727 0.98347107 0.93040293 0.9469697 0.9844358 0.91911765] mean value: 0.9330385035167746 key: test_recall value: [1. 1. 1. 1. 0.92857143 0.89285714 1. 1. 0.85714286 1. ] mean value: 0.9678571428571429 key: train_recall value: [0.95669291 1. 0.98418972 0.98418972 0.98818898 0.93700787 1. 0.98425197 0.99606299 0.98425197] mean value: 0.9814836139553702 key: test_roc_auc value: [0.96551724 0.79310345 0.94642857 1. 0.875 0.94642857 0.85714286 0.92857143 0.875 0.96428571] mean value: 0.9151477832512316 key: train_roc_auc value: [0.9724176 0.90513834 0.93894526 0.95075628 0.94685039 0.96062992 0.96259843 0.96456693 0.99015748 0.9488189 ] mean value: 0.9540879524446796 key: test_jcc value: [0.93333333 0.7 0.90625 1. 0.78787879 0.89285714 0.77777778 0.875 0.77419355 0.93333333] mean value: 0.8580623923567472 key: train_jcc value: [0.94552529 0.8410596 0.88928571 0.90875912 0.9028777 0.92248062 0.93040293 0.93283582 0.98062016 0.9057971 ] mean value: 0.9159644058634359 MCC on Blind test: 0.73 Accuracy on Blind test: 0.9 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.18949628 0.16845989 0.17046356 0.17315173 0.18008161 0.17156911 0.17026114 0.16903186 0.18008924 0.17638707] mean value: 0.17489914894104003 key: score_time value: [0.01661611 0.01543522 0.01551104 0.01576376 0.01544642 0.01590562 0.01548719 0.01546669 0.01654816 0.01654792] mean value: 0.015872812271118163 key: test_mcc value: [1. 0.96551724 1. 0.96547546 0.83484711 1. 0.96490128 0.93094934 0.96490128 1. ] mean value: 0.962659170679551 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.98245614 1. 0.98245614 0.91071429 1. 0.98214286 0.96428571 0.98214286 1. ] mean value: 0.9804197994987468 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.98245614 1. 0.98305085 0.91803279 1. 0.98245614 0.96551724 0.98245614 1. ] mean value: 0.9813969296774815 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96551724 1. 0.96666667 0.84848485 1. 0.96551724 0.93333333 0.96551724 1. ] mean value: 0.9645036572622779 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.98275862 1. 0.98214286 0.91071429 1. 0.98214286 0.96428571 0.98214286 1. ] mean value: 0.9804187192118227 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.96551724 1. 0.96666667 0.84848485 1. 0.96551724 0.93333333 0.96551724 1. ] mean value: 0.9645036572622779 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.88 Accuracy on Blind test: 0.96 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.05747509 0.07012534 0.08541584 0.09682059 0.08754301 0.08933735 0.07895207 0.0594759 0.07882285 0.08167696] mean value: 0.07856450080871583 key: score_time value: [0.01824808 0.03937316 0.03136182 0.03505492 0.03816462 0.03392482 0.03387403 0.02909517 0.02104235 0.0223937 ] mean value: 0.030253267288208006 key: test_mcc value: [1. 0.8951918 1. 0.96547546 0.83484711 1. 0.93094934 0.93094934 0.96490128 1. ] mean value: 0.9522314322910705 key: train_mcc value: [1. 0.99211042 0.99214142 0.99606299 1. 1. 0.99607071 0.99607071 0.99607071 0.99215674] mean value: 0.9960683715267692 key: test_accuracy value: [1. 0.94736842 1. 0.98245614 0.91071429 1. 0.96428571 0.96428571 0.98214286 1. ] mean value: 0.9751253132832081 key: train_accuracy value: [1. 0.99605523 0.99605523 0.99802761 1. 1. 0.9980315 0.9980315 0.9980315 0.99606299] mean value: 0.9980295547376105 key: test_fscore value: [1. 0.94545455 1. 0.98305085 0.91803279 1. 0.96551724 0.96551724 0.98245614 1. ] mean value: 0.9760028802906916 key: train_fscore value: [1. 0.99606299 0.99606299 0.99802761 1. 1. 0.99803536 0.99803536 0.99803536 0.99607843] mean value: 0.9980338119410027 key: test_precision value: [1. 0.96296296 1. 0.96666667 0.84848485 1. 0.93333333 0.93333333 0.96551724 1. ] mean value: 0.9610298386160455 key: train_precision value: [1. 0.99606299 0.99215686 0.99606299 1. 1. 0.99607843 0.99607843 0.99607843 0.9921875 ] mean value: 0.9964705641114714 key: test_recall value: [1. 0.92857143 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9928571428571429 key: train_recall value: [1. 0.99606299 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9996062992125985 key: test_roc_auc value: [1. 0.94704433 1. 0.98214286 0.91071429 1. 0.96428571 0.96428571 0.98214286 1. ] mean value: 0.9750615763546798 key: train_roc_auc value: [1. 0.99605521 0.99606299 0.9980315 1. 1. 0.9980315 0.9980315 0.9980315 0.99606299] mean value: 0.9980307179981949 key: test_jcc value: [1. 0.89655172 1. 0.96666667 0.84848485 1. 0.93333333 0.93333333 0.96551724 1. ] mean value: 0.9543887147335424 key: train_jcc value: [1. 0.99215686 0.99215686 0.99606299 1. 1. 0.99607843 0.99607843 0.99607843 0.9921875 ] mean value: 0.9960799511733828 MCC on Blind test: 0.87 Accuracy on Blind test: 0.96 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.17560363 0.20586491 0.16315055 0.14851332 0.21609211 0.3052907 0.25830936 0.24186087 0.25000715 0.23897648] mean value: 0.220366907119751 key: score_time value: [0.02612948 0.02948737 0.01560521 0.02712798 0.03579521 0.02561378 0.02915239 0.02643967 0.0419426 0.02583075] mean value: 0.02831244468688965 key: test_mcc value: [0.83797038 0.82942474 0.77728159 0.96547546 0.76225171 0.82195294 0.80439967 0.93094934 0.89342711 0.96490128] mean value: 0.8588034211725772 key: train_mcc value: [0.98434291 0.98434291 0.98823511 0.98823511 0.98437404 0.99607071 0.98437404 0.98437404 0.98825791 0.99215674] mean value: 0.9874763527102344 key: test_accuracy value: [0.9122807 0.9122807 0.87719298 0.98245614 0.875 0.91071429 0.89285714 0.96428571 0.94642857 0.98214286] mean value: 0.925563909774436 key: train_accuracy value: [0.99211045 0.99211045 0.99408284 0.99408284 0.99212598 0.9980315 0.99212598 0.99212598 0.99409449 0.99606299] mean value: 0.9936953516905062 key: test_fscore value: [0.91803279 0.91525424 0.89230769 0.98305085 0.8852459 0.9122807 0.90322581 0.96551724 0.94736842 0.98245614] mean value: 0.9304739776566863 key: train_fscore value: [0.9921875 0.9921875 0.99410609 0.99410609 0.9921875 0.99803536 0.9921875 0.9921875 0.99412916 0.99607843] mean value: 0.9937392634089591 key: test_precision value: [0.84848485 0.87096774 0.80555556 0.96666667 0.81818182 0.89655172 0.82352941 0.93333333 0.93103448 0.96551724] mean value: 0.8859822824198275 key: train_precision value: [0.98449612 0.98449612 0.98828125 0.98828125 0.98449612 0.99607843 0.98449612 0.98449612 0.98832685 0.9921875 ] mean value: 0.9875635899776615 key: test_recall value: [1. 0.96428571 1. 1. 0.96428571 0.92857143 1. 1. 0.96428571 1. ] mean value: 0.9821428571428572 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9137931 0.91317734 0.875 0.98214286 0.875 0.91071429 0.89285714 0.96428571 0.94642857 0.98214286] mean value: 0.9255541871921182 key: train_roc_auc value: [0.99209486 0.99209486 0.99409449 0.99409449 0.99212598 0.9980315 0.99212598 0.99212598 0.99409449 0.99606299] mean value: 0.9936945628831969 key: test_jcc value: [0.84848485 0.84375 0.80555556 0.96666667 0.79411765 0.83870968 0.82352941 0.93333333 0.9 0.96551724] mean value: 0.8719664381662598 key: train_jcc value: [0.98449612 0.98449612 0.98828125 0.98828125 0.98449612 0.99607843 0.98449612 0.98449612 0.98832685 0.9921875 ] mean value: 0.9875635899776615 MCC on Blind test: 0.38 Accuracy on Blind test: 0.8 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.67622995 0.6558485 0.66100764 0.65075469 0.66132975 0.65230036 0.65977788 0.65869689 0.66375089 0.65631771] mean value: 0.6596014261245727 key: score_time value: [0.00952291 0.00961614 0.0093987 0.00940108 0.00963163 0.00945449 0.00950718 0.00944567 0.00945544 0.0094955 ] mean value: 0.009492874145507812 key: test_mcc value: [1. 0.96551724 1. 1. 0.89802651 1. 0.93094934 0.93094934 0.96490128 1. ] mean value: 0.9690343705369726 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.98245614 1. 1. 0.94642857 1. 0.96428571 0.96428571 0.98214286 1. ] mean value: 0.9839598997493735 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.98245614 1. 1. 0.94915254 1. 0.96551724 0.96551724 0.98245614 1. ] mean value: 0.9845099305833256 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96551724 1. 1. 0.90322581 1. 0.93333333 0.93333333 0.96551724 1. ] mean value: 0.9700926955876901 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.98275862 1. 1. 0.94642857 1. 0.96428571 0.96428571 0.98214286 1. ] mean value: 0.9839901477832512 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.96551724 1. 1. 0.90322581 1. 0.93333333 0.93333333 0.96551724 1. ] mean value: 0.9700926955876901 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.87 Accuracy on Blind test: 0.96 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03066301 0.03192091 0.03159785 0.03225589 0.03112817 0.0315671 0.05233073 0.07486224 0.05127001 0.05236316] mean value: 0.041995906829833986 key: score_time value: [0.01229358 0.01268935 0.01368308 0.01742101 0.01394725 0.01394153 0.01992059 0.0179832 0.01861358 0.02085924] mean value: 0.016135239601135255 key: test_mcc value: [1. 0.96547546 0.96547546 1. 0.96490128 0.89342711 1. 1. 0.96490128 1. ] mean value: 0.9754180588113227 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.98245614 0.98245614 1. 0.98214286 0.94642857 1. 1. 0.98214286 1. ] mean value: 0.987562656641604 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.98181818 0.98305085 1. 0.98181818 0.94545455 1. 1. 0.98181818 1. ] mean value: 0.9873959938366718 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.96666667 1. 1. 0.96296296 1. 1. 1. 1. ] mean value: 0.9929629629629629 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96428571 1. 1. 0.96428571 0.92857143 1. 1. 0.96428571 1. ] mean value: 0.9821428571428572 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.98214286 0.98214286 1. 0.98214286 0.94642857 1. 1. 0.98214286 1. ] mean value: 0.9875 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.96428571 0.96666667 1. 0.96428571 0.89655172 1. 1. 0.96428571 1. ] mean value: 0.9756075533661741 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: -0.05 Accuracy on Blind test: 0.78 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.03958082 0.03368187 0.04933643 0.03995538 0.03961086 0.03955531 0.04075122 0.02445912 0.06261611 0.03506374] mean value: 0.04046108722686768 key: score_time value: [0.03454447 0.01914907 0.01997089 0.0189991 0.01908541 0.01905084 0.02588892 0.02134299 0.02253413 0.02127409] mean value: 0.022183990478515624 key: test_mcc value: [0.9321832 0.8615634 0.8953202 1. 0.82195294 1. 0.79385662 0.85933785 0.82195294 0.93094934] mean value: 0.8917116490649181 key: train_mcc value: [0.95269145 0.9605814 0.94872553 0.96055211 0.96853396 0.95670033 0.95675965 0.96853396 0.95670033 0.95670033] mean value: 0.9586479053242231 key: test_accuracy value: [0.96491228 0.92982456 0.94736842 1. 0.91071429 1. 0.89285714 0.92857143 0.91071429 0.96428571] mean value: 0.9449248120300752 key: train_accuracy value: [0.97633136 0.98027613 0.97435897 0.98027613 0.98425197 0.97834646 0.97834646 0.98425197 0.97834646 0.97834646] mean value: 0.9793132367329823 key: test_fscore value: [0.96551724 0.92592593 0.94736842 1. 0.9122807 1. 0.9 0.93103448 0.90909091 0.96551724] mean value: 0.9456734923341094 key: train_fscore value: [0.97647059 0.98039216 0.97435897 0.98023715 0.98431373 0.978389 0.97847358 0.98418972 0.97830375 0.978389 ] mean value: 0.9793517647236116 key: test_precision value: [0.93333333 0.96153846 0.96428571 1. 0.89655172 1. 0.84375 0.9 0.92592593 0.93333333] mean value: 0.93587184925547 key: train_precision value: [0.97265625 0.9765625 0.97244094 0.98023715 0.98046875 0.97647059 0.97276265 0.98809524 0.98023715 0.97647059] mean value: 0.9776401813662509 key: test_recall value: [1. 0.89285714 0.93103448 1. 0.92857143 1. 0.96428571 0.96428571 0.89285714 1. ] mean value: 0.9573891625615764 key: train_recall value: [0.98031496 0.98425197 0.97628458 0.98023715 0.98818898 0.98031496 0.98425197 0.98031496 0.97637795 0.98031496] mean value: 0.9810852447791852 key: test_roc_auc value: [0.96551724 0.92918719 0.9476601 1. 0.91071429 1. 0.89285714 0.92857143 0.91071429 0.96428571] mean value: 0.9449507389162561 key: train_roc_auc value: [0.97632349 0.98026828 0.97436276 0.98027606 0.98425197 0.97834646 0.97834646 0.98425197 0.97834646 0.97834646] mean value: 0.9793120351062837 key: test_jcc value: [0.93333333 0.86206897 0.9 1. 0.83870968 1. 0.81818182 0.87096774 0.83333333 0.93333333] mean value: 0.8989928203053899 key: train_jcc value: [0.95402299 0.96153846 0.95 0.96124031 0.96911197 0.95769231 0.95785441 0.9688716 0.95752896 0.95769231] mean value: 0.9595553303608277 MCC on Blind test: 0.76 Accuracy on Blind test: 0.91 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.30773807 0.29322624 0.32750988 0.32462645 0.31639314 0.32397628 0.31056094 0.29872632 0.36270189 0.40175366] mean value: 0.3267212867736816 key: score_time value: [0.01916409 0.01916027 0.01916218 0.01910424 0.01910162 0.01980019 0.01962781 0.02352262 0.0193069 0.01963854] mean value: 0.019758844375610353 key: test_mcc value: [0.9321832 0.8615634 0.8953202 1. 0.75047877 1. 0.79385662 0.85933785 0.82195294 0.93094934] mean value: 0.8845642321659994 key: train_mcc value: [0.95269145 0.9605814 0.94872553 0.96055211 0.9645744 0.95670033 0.95675965 0.96853396 0.95670033 0.95670033] mean value: 0.9582519495785499 key: test_accuracy value: [0.96491228 0.92982456 0.94736842 1. 0.875 1. 0.89285714 0.92857143 0.91071429 0.96428571] mean value: 0.9413533834586466 key: train_accuracy value: [0.97633136 0.98027613 0.97435897 0.98027613 0.98228346 0.97834646 0.97834646 0.98425197 0.97834646 0.97834646] mean value: 0.9791163863392816 key: test_fscore value: [0.96551724 0.92592593 0.94736842 1. 0.87719298 1. 0.9 0.93103448 0.90909091 0.96551724] mean value: 0.9421647204042848 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:148: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:151: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.97647059 0.98039216 0.97435897 0.98023715 0.98231827 0.978389 0.97847358 0.98418972 0.97830375 0.978389 ] mean value: 0.9791522192865763 key: test_precision value: [0.93333333 0.96153846 0.96428571 1. 0.86206897 1. 0.84375 0.9 0.92592593 0.93333333] mean value: 0.932423573393401 key: train_precision value: [0.97265625 0.9765625 0.97244094 0.98023715 0.98039216 0.97647059 0.97276265 0.98809524 0.98023715 0.97647059] mean value: 0.9776325220525254 key: test_recall value: [1. 0.89285714 0.93103448 1. 0.89285714 1. 0.96428571 0.96428571 0.89285714 1. ] mean value: 0.9538177339901478 key: train_recall value: [0.98031496 0.98425197 0.97628458 0.98023715 0.98425197 0.98031496 0.98425197 0.98031496 0.97637795 0.98031496] mean value: 0.9806915439917837 key: test_roc_auc value: [0.96551724 0.92918719 0.9476601 1. 0.875 1. 0.89285714 0.92857143 0.91071429 0.96428571] mean value: 0.9413793103448276 key: train_roc_auc value: [0.97632349 0.98026828 0.97436276 0.98027606 0.98228346 0.97834646 0.97834646 0.98425197 0.97834646 0.97834646] mean value: 0.979115184712583 key: test_jcc value: [0.93333333 0.86206897 0.9 1. 0.78125 1. 0.81818182 0.87096774 0.83333333 0.93333333] mean value: 0.8932468525634544 key: train_jcc value: [0.95402299 0.96153846 0.95 0.96124031 0.96525097 0.95769231 0.95785441 0.9688716 0.95752896 0.95769231] mean value: 0.9591692299747274 MCC on Blind test: 0.76 Accuracy on Blind test: 0.91 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.02975988 0.03407001 0.0321157 0.04390645 0.05316114 0.07947993 0.02710176 0.03776312 0.04052353 0.02836466] mean value: 0.04062461853027344 key: score_time value: [0.01219678 0.01215196 0.01284242 0.01259923 0.01233125 0.01213431 0.01271582 0.01214218 0.01203442 0.0119319 ] mean value: 0.012308025360107422 key: test_mcc value: [0.62994079 1. 0.6000992 0.66143783 0.87287156 0.87287156 0.32732684 0.66143783 0.32732684 0.53452248] mean value: 0.6487834918450751 key: train_mcc value: [0.8979331 0.88273483 0.89791134 0.86948194 0.8687127 0.8687127 0.91277477 0.91240409 0.8687127 0.91277477] mean value: 0.8892152945204359 key: test_accuracy value: [0.8125 1. 0.8 0.8 0.93333333 0.93333333 0.66666667 0.8 0.66666667 0.73333333] mean value: 0.8145833333333333 key: train_accuracy value: [0.94852941 0.94117647 0.94890511 0.93430657 0.93430657 0.93430657 0.95620438 0.95620438 0.93430657 0.95620438] mean value: 0.9444450407900387 key: test_fscore value: [0.82352941 1. 0.76923077 0.82352941 0.92307692 0.92307692 0.70588235 0.76923077 0.70588235 0.8 ] mean value: 0.8243438914027149 key: train_fscore value: [0.94736842 0.94029851 0.94890511 0.93333333 0.93430657 0.93430657 0.95522388 0.95588235 0.93430657 0.95522388] mean value: 0.9439155193502107 key: test_precision value: [0.77777778 1. 0.83333333 0.7 1. 1. 0.66666667 1. 0.66666667 0.66666667] mean value: 0.8311111111111111 key: train_precision value: [0.96923077 0.95454545 0.95588235 0.95454545 0.94117647 0.94117647 0.96969697 0.95588235 0.92753623 0.96969697] mean value: 0.95393694966585 key: test_recall value: [0.875 1. 0.71428571 1. 0.85714286 0.85714286 0.75 0.625 0.75 1. ] mean value: 0.8428571428571429 key: train_recall value: [0.92647059 0.92647059 0.94202899 0.91304348 0.92753623 0.92753623 0.94117647 0.95588235 0.94117647 0.94117647] mean value: 0.9342497868712702 key: test_roc_auc value: [0.8125 1. 0.79464286 0.8125 0.92857143 0.92857143 0.66071429 0.8125 0.66071429 0.71428571] mean value: 0.8125 key: train_roc_auc value: [0.94852941 0.94117647 0.94895567 0.93446292 0.93435635 0.93435635 0.95609548 0.95620205 0.93435635 0.95609548] mean value: 0.944458653026428 key: test_jcc value: [0.7 1. 0.625 0.7 0.85714286 0.85714286 0.54545455 0.625 0.54545455 0.66666667] mean value: 0.7121861471861471 key: train_jcc value: [0.9 0.88732394 0.90277778 0.875 0.87671233 0.87671233 0.91428571 0.91549296 0.87671233 0.91428571] mean value: 0.8939303094059027 MCC on Blind test: 0.67 Accuracy on Blind test: 0.87 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.8173337 1.18391466 0.87638569 1.19149208 0.98477817 0.98145533 1.09080052 0.8652916 0.64874887 0.81251073] mean value: 0.9452711343765259 key: score_time value: [0.01361656 0.01263309 0.01250887 0.0123415 0.0226686 0.01354671 0.01257586 0.01254773 0.01364255 0.01389027] mean value: 0.013997173309326172 key: test_mcc value: [0.62994079 1. 0.6000992 0.76376262 0.87287156 0.73214286 0.32732684 0.66143783 0.73214286 0.53452248] mean value: 0.6854247024498333 key: train_mcc value: [0.94158382 0.94117647 0.95630861 0.97122151 0.91240409 1. 0.8251228 0.95629932 1. 0.98550418] mean value: 0.9489620795067616 key: test_accuracy value: [0.8125 1. 0.8 0.86666667 0.93333333 0.86666667 0.66666667 0.8 0.86666667 0.73333333] mean value: 0.8345833333333333 key: train_accuracy value: [0.97058824 0.97058824 0.97810219 0.98540146 0.95620438 1. 0.91240876 0.97810219 1. 0.99270073] mean value: 0.9744096178617433 key: test_fscore value: [0.82352941 1. 0.76923077 0.875 0.92307692 0.85714286 0.70588235 0.76923077 0.875 0.8 ] mean value: 0.8398093083387201 key: train_fscore value: [0.97014925 0.97058824 0.97810219 0.98529412 0.95652174 1. 0.91044776 0.97777778 1. 0.99259259] mean value: 0.9741473667148377 key: test_precision value: [0.77777778 1. 0.83333333 0.77777778 1. 0.85714286 0.66666667 1. 0.875 0.66666667] mean value: 0.8454365079365079 key: train_precision value: [0.98484848 0.97058824 0.98529412 1. 0.95652174 1. 0.92424242 0.98507463 1. 1. ] mean value: 0.9806569628028192 key: test_recall value: [0.875 1. 0.71428571 1. 0.85714286 0.85714286 0.75 0.625 0.875 1. ] mean value: 0.8553571428571428 key: train_recall value: [0.95588235 0.97058824 0.97101449 0.97101449 0.95652174 1. 0.89705882 0.97058824 1. 0.98529412] mean value: 0.9677962489343563 key: test_roc_auc value: [0.8125 1. 0.79464286 0.875 0.92857143 0.86607143 0.66071429 0.8125 0.86607143 0.71428571] mean value: 0.8330357142857143 key: train_roc_auc value: [0.97058824 0.97058824 0.97815431 0.98550725 0.95620205 1. 0.91229753 0.97804774 1. 0.99264706] mean value: 0.9744032395566923 key: test_jcc value: [0.7 1. 0.625 0.77777778 0.85714286 0.75 0.54545455 0.625 0.77777778 0.66666667] mean value: 0.7324819624819625 key: train_jcc value: [0.94202899 0.94285714 0.95714286 0.97101449 0.91666667 1. 0.83561644 0.95652174 1. 0.98529412] mean value: 0.9507142440061194 MCC on Blind test: 0.73 Accuracy on Blind test: 0.9 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01516128 0.01172233 0.00925016 0.00884438 0.00867152 0.00968885 0.00902534 0.0096035 0.00953603 0.00910902] mean value: 0.010061240196228028 key: score_time value: [0.01382613 0.00929689 0.00917935 0.00864553 0.0086813 0.00937629 0.00872087 0.00905013 0.00939441 0.00907898] mean value: 0.009524989128112792 key: test_mcc value: [0.40451992 0.67419986 0.34247476 0.47245559 0.41931393 0.34247476 0.09449112 0.49099025 0.33928571 0.21821789] mean value: 0.3798423801118058 key: train_mcc value: [0.57294631 0.54111596 0.60664573 0.55776902 0.59674775 0.57705251 0.64235303 0.72621377 0.73836136 0.5626648 ] mean value: 0.6121870240894057 key: test_accuracy value: [0.6875 0.8125 0.66666667 0.73333333 0.66666667 0.66666667 0.53333333 0.73333333 0.66666667 0.6 ] mean value: 0.6766666666666666 key: train_accuracy value: [0.76470588 0.75 0.78832117 0.75912409 0.77372263 0.76642336 0.81021898 0.86131387 0.86861314 0.75912409] mean value: 0.7901567196221554 key: test_fscore value: [0.61538462 0.76923077 0.54545455 0.66666667 0.44444444 0.54545455 0.46153846 0.71428571 0.66666667 0.57142857] mean value: 0.6000555000555 key: train_fscore value: [0.70909091 0.69090909 0.75213675 0.7079646 0.72072072 0.71428571 0.77966102 0.85271318 0.86363636 0.69724771] mean value: 0.7488366054215206 key: test_precision value: [0.8 1. 0.75 0.8 1. 0.75 0.6 0.83333333 0.71428571 0.66666667] mean value: 0.7914285714285715 key: train_precision value: [0.92857143 0.9047619 0.91666667 0.90909091 0.95238095 0.93023256 0.92 0.90163934 0.890625 0.92682927] mean value: 0.9180798032166374 key: test_recall value: [0.5 0.625 0.42857143 0.57142857 0.28571429 0.42857143 0.375 0.625 0.625 0.5 ] mean value: 0.49642857142857144 key: train_recall value: [0.57352941 0.55882353 0.63768116 0.57971014 0.57971014 0.57971014 0.67647059 0.80882353 0.83823529 0.55882353] mean value: 0.639151747655584 key: test_roc_auc value: [0.6875 0.8125 0.65178571 0.72321429 0.64285714 0.65178571 0.54464286 0.74107143 0.66964286 0.60714286] mean value: 0.6732142857142858 key: train_roc_auc value: [0.76470588 0.75 0.78942882 0.76044331 0.77514919 0.76779625 0.80924979 0.8609335 0.86839301 0.75767263] mean value: 0.7903772378516624 key: test_jcc value: [0.44444444 0.625 0.375 0.5 0.28571429 0.375 0.3 0.55555556 0.5 0.4 ] mean value: 0.43607142857142855 key: train_jcc value: [0.54929577 0.52777778 0.60273973 0.54794521 0.56338028 0.55555556 0.63888889 0.74324324 0.76 0.53521127] mean value: 0.6024037720915977 MCC on Blind test: 0.28 Accuracy on Blind test: 0.78 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00991607 0.00885415 0.0101788 0.01003098 0.00886106 0.00952673 0.00979686 0.010741 0.01050067 0.01020956] mean value: 0.009861588478088379 key: score_time value: [0.00879622 0.0093503 0.00995898 0.00946927 0.00864244 0.00944114 0.00965357 0.01037049 0.00998425 0.00960541] mean value: 0.009527206420898438 key: test_mcc value: [ 0.40451992 0.67419986 0.46428571 0.75592895 0.6000992 0.46428571 -0.19642857 0.37796447 0.47245559 0.32732684] mean value: 0.43446376808762277 key: train_mcc value: [0.67911938 0.60616144 0.68986702 0.63862773 0.65701381 0.57996733 0.66746486 0.62437433 0.66581484 0.640228 ] mean value: 0.6448638739267128 key: test_accuracy value: [0.6875 0.8125 0.73333333 0.86666667 0.8 0.73333333 0.4 0.66666667 0.73333333 0.66666667] mean value: 0.71 key: train_accuracy value: [0.83823529 0.80147059 0.83941606 0.81751825 0.82481752 0.78832117 0.83211679 0.81021898 0.83211679 0.81751825] mean value: 0.8201749677973379 key: test_fscore value: [0.61538462 0.76923077 0.71428571 0.83333333 0.76923077 0.71428571 0.4 0.61538462 0.77777778 0.70588235] mean value: 0.6914795661854485 key: train_fscore value: [0.83076923 0.79069767 0.82539683 0.80916031 0.8125 0.77862595 0.82170543 0.796875 0.82442748 0.80314961] mean value: 0.8093307503698478 key: test_precision value: [0.8 1. 0.71428571 1. 0.83333333 0.71428571 0.42857143 0.8 0.7 0.66666667] mean value: 0.7657142857142857 key: train_precision value: [0.87096774 0.83606557 0.9122807 0.85483871 0.88135593 0.82258065 0.86885246 0.85 0.85714286 0.86440678] mean value: 0.8618491400322729 key: test_recall value: [0.5 0.625 0.71428571 0.71428571 0.71428571 0.71428571 0.375 0.5 0.875 0.75 ] mean value: 0.6482142857142857 key: train_recall value: [0.79411765 0.75 0.75362319 0.76811594 0.75362319 0.73913043 0.77941176 0.75 0.79411765 0.75 ] mean value: 0.7632139812446718 key: test_roc_auc value: [0.6875 0.8125 0.73214286 0.85714286 0.79464286 0.73214286 0.40178571 0.67857143 0.72321429 0.66071429] mean value: 0.7080357142857143 key: train_roc_auc value: [0.83823529 0.80147059 0.84004689 0.8178815 0.82534101 0.78868286 0.83173487 0.80978261 0.83184143 0.81702899] mean value: 0.8202046035805627 key: test_jcc value: [0.44444444 0.625 0.55555556 0.71428571 0.625 0.55555556 0.25 0.44444444 0.63636364 0.54545455] mean value: 0.5396103896103897 key: train_jcc value: [0.71052632 0.65384615 0.7027027 0.67948718 0.68421053 0.6375 0.69736842 0.66233766 0.7012987 0.67105263] mean value: 0.6800330294409241 MCC on Blind test: 0.26 Accuracy on Blind test: 0.69 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00962687 0.0089643 0.00940514 0.00952768 0.00951171 0.00911784 0.00948524 0.00956416 0.01086783 0.00937128] mean value: 0.00954420566558838 key: score_time value: [0.01055789 0.01027226 0.01053548 0.010499 0.01059937 0.00998354 0.01044297 0.01039433 0.01533222 0.01476574] mean value: 0.011338281631469726 key: test_mcc value: [ 0.25 0.12598816 0.32732684 0.60714286 0.64465837 0.73214286 -0.07142857 0.33928571 0.21821789 0.75592895] mean value: 0.3929263057641339 key: train_mcc value: [0.62196632 0.60300638 0.6647466 0.59240339 0.60592498 0.5644673 0.66581484 0.60584099 0.61074523 0.640228 ] mean value: 0.6175144024127495 key: test_accuracy value: [0.625 0.5625 0.66666667 0.8 0.8 0.86666667 0.46666667 0.66666667 0.6 0.86666667] mean value: 0.6920833333333334 key: train_accuracy value: [0.80882353 0.80147059 0.83211679 0.79562044 0.80291971 0.7810219 0.83211679 0.80291971 0.80291971 0.81751825] mean value: 0.8077447402318592 key: test_fscore value: [0.625 0.58823529 0.61538462 0.8 0.72727273 0.85714286 0.5 0.66666667 0.57142857 0.88888889] mean value: 0.6840019620901974 key: train_fscore value: [0.796875 0.8 0.83687943 0.79104478 0.80291971 0.77272727 0.82442748 0.8 0.78740157 0.80314961] mean value: 0.8015424851518379 key: test_precision value: [0.625 0.55555556 0.66666667 0.75 1. 0.85714286 0.5 0.71428571 0.66666667 0.8 ] mean value: 0.7135317460317461 key: train_precision value: [0.85 0.80597015 0.81944444 0.81538462 0.80882353 0.80952381 0.85714286 0.80597015 0.84745763 0.86440678] mean value: 0.8284123961194615 key: test_recall value: [0.625 0.625 0.57142857 0.85714286 0.57142857 0.85714286 0.5 0.625 0.5 1. ] mean value: 0.6732142857142857 key: train_recall value: [0.75 0.79411765 0.85507246 0.76811594 0.79710145 0.73913043 0.79411765 0.79411765 0.73529412 0.75 ] mean value: 0.7777067348678601 key: test_roc_auc value: [0.625 0.5625 0.66071429 0.80357143 0.78571429 0.86607143 0.46428571 0.66964286 0.60714286 0.85714286] mean value: 0.6901785714285714 key: train_roc_auc value: [0.80882353 0.80147059 0.831948 0.79582268 0.80296249 0.78132992 0.83184143 0.80285592 0.80242967 0.81702899] mean value: 0.8076513213981245 key: test_jcc value: [0.45454545 0.41666667 0.44444444 0.66666667 0.57142857 0.75 0.33333333 0.5 0.4 0.8 ] mean value: 0.5337085137085137 key: train_jcc value: [0.66233766 0.66666667 0.7195122 0.65432099 0.67073171 0.62962963 0.7012987 0.66666667 0.64935065 0.67105263] mean value: 0.6691567497622268 MCC on Blind test: 0.26 Accuracy on Blind test: 0.64 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01161861 0.01145601 0.01143718 0.01146555 0.01147413 0.01156044 0.01610374 0.01574993 0.01282072 0.01037717] mean value: 0.012406349182128906 key: score_time value: [0.01001382 0.01000714 0.00987005 0.00989294 0.01008201 0.00988507 0.01483679 0.01519465 0.01058769 0.00996661] mean value: 0.01103367805480957 key: test_mcc value: [0.75 0.5 0.6000992 0.56407607 0.6000992 0.87287156 0.18898224 0.66143783 0.18898224 0.53452248] mean value: 0.5461070816659918 key: train_mcc value: [0.83905224 0.76503685 0.83951407 0.79590547 0.83951407 0.79599234 0.85434012 0.83951407 0.8251228 0.81031543] mean value: 0.8204307458510867 key: test_accuracy value: [0.875 0.75 0.8 0.73333333 0.8 0.93333333 0.6 0.8 0.6 0.73333333] mean value: 0.7625 key: train_accuracy value: [0.91911765 0.88235294 0.91970803 0.89781022 0.91970803 0.89781022 0.9270073 0.91970803 0.91240876 0.90510949] mean value: 0.9100740661227995 key: test_fscore value: [0.875 0.75 0.76923077 0.77777778 0.76923077 0.92307692 0.66666667 0.76923077 0.66666667 0.8 ] mean value: 0.7766880341880341 key: train_fscore value: [0.91729323 0.88405797 0.91970803 0.9 0.91970803 0.89705882 0.92537313 0.91970803 0.91044776 0.90510949] mean value: 0.9098464499791336 key: test_precision value: [0.875 0.75 0.83333333 0.63636364 0.83333333 1. 0.6 1. 0.6 0.66666667] mean value: 0.7794696969696969 key: train_precision value: [0.93846154 0.87142857 0.92647059 0.88732394 0.92647059 0.91044776 0.93939394 0.91304348 0.92424242 0.89855072] mean value: 0.9135833557751614 key: test_recall value: [0.875 0.75 0.71428571 1. 0.71428571 0.85714286 0.75 0.625 0.75 1. ] mean value: 0.8035714285714286 key: train_recall value: [0.89705882 0.89705882 0.91304348 0.91304348 0.91304348 0.88405797 0.91176471 0.92647059 0.89705882 0.91176471] mean value: 0.9064364876385337 key: test_roc_auc value: [0.875 0.75 0.79464286 0.75 0.79464286 0.92857143 0.58928571 0.8125 0.58928571 0.71428571] mean value: 0.7598214285714285 key: train_roc_auc value: [0.91911765 0.88235294 0.91975703 0.89769821 0.91975703 0.89791134 0.92689685 0.91975703 0.91229753 0.90515772] mean value: 0.9100703324808184 key: test_jcc value: [0.77777778 0.6 0.625 0.63636364 0.625 0.85714286 0.5 0.625 0.5 0.66666667] mean value: 0.6412950937950938 key: train_jcc value: [0.84722222 0.79220779 0.85135135 0.81818182 0.85135135 0.81333333 0.86111111 0.85135135 0.83561644 0.82666667] mean value: 0.8348393436133162 MCC on Blind test: 0.48 Accuracy on Blind test: 0.76 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.70389104 0.67864847 1.09095931 0.99975729 0.98860121 0.84449649 0.65940809 0.66229033 0.58026719 0.63522124] mean value: 0.7843540668487549 key: score_time value: [0.01230264 0.01353049 0.01278973 0.01525879 0.0178535 0.0141089 0.01234412 0.01248693 0.01246476 0.01245332] mean value: 0.013559317588806153 key: test_mcc value: [0.37796447 0.75 0.32732684 0.46770717 0.6000992 0.32732684 0.47245559 0.66143783 0.46428571 0.53452248] mean value: 0.4983126132351171 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.6875 0.875 0.66666667 0.66666667 0.8 0.66666667 0.73333333 0.8 0.73333333 0.73333333] mean value: 0.73625 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.70588235 0.875 0.61538462 0.73684211 0.76923077 0.61538462 0.77777778 0.76923077 0.75 0.8 ] mean value: 0.7414733005212881 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.66666667 0.875 0.66666667 0.58333333 0.83333333 0.66666667 0.7 1. 0.75 0.66666667] mean value: 0.7408333333333333 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 0.875 0.57142857 1. 0.71428571 0.57142857 0.875 0.625 0.75 1. ] mean value: 0.7732142857142857 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.6875 0.875 0.66071429 0.6875 0.79464286 0.66071429 0.72321429 0.8125 0.73214286 0.71428571] mean value: 0.7348214285714285 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.54545455 0.77777778 0.44444444 0.58333333 0.625 0.44444444 0.63636364 0.625 0.6 0.66666667] mean value: 0.5948484848484848 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.5 Accuracy on Blind test: 0.79 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01449776 0.01498055 0.01225662 0.01179147 0.01136565 0.01139188 0.0118866 0.01129723 0.01635385 0.01810718] mean value: 0.013392877578735352 key: score_time value: [0.01178002 0.01000428 0.00914884 0.00882721 0.00953579 0.00952053 0.00946212 0.00917983 0.01388979 0.0131166 ] mean value: 0.010446500778198243 key: test_mcc value: [0.77459667 0.8819171 0.875 1. 1. 0.87287156 0.75592895 0.87287156 0.875 1. ] mean value: 0.8908185840836074 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.875 0.9375 0.93333333 1. 1. 0.93333333 0.86666667 0.93333333 0.93333333 1. ] mean value: 0.94125 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 0.93333333 0.93333333 1. 1. 0.92307692 0.88888889 0.94117647 0.93333333 1. ] mean value: 0.9410285139696904 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.875 1. 1. 1. 0.8 0.88888889 1. 1. ] mean value: 0.9563888888888888 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 0.875 1. 1. 1. 0.85714286 1. 1. 0.875 1. ] mean value: 0.9357142857142857 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.875 0.9375 0.9375 1. 1. 0.92857143 0.85714286 0.92857143 0.9375 1. ] mean value: 0.9401785714285714 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 0.875 0.875 1. 1. 0.85714286 0.8 0.88888889 0.875 1. ] mean value: 0.8921031746031746 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.86 Accuracy on Blind test: 0.94 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.09935236 0.10409498 0.12266827 0.1061933 0.09845328 0.10499287 0.09967995 0.09435582 0.08926463 0.09026933] mean value: 0.10093247890472412 key: score_time value: [0.02710509 0.01923895 0.01924515 0.01959372 0.01917553 0.02634406 0.01896572 0.01882887 0.01784062 0.01771426] mean value: 0.02040519714355469 key: test_mcc value: [0.5 0.62994079 0.6000992 0.49099025 0.87287156 0.73214286 0.19642857 0.66143783 0.47245559 0.53452248] mean value: 0.5690889131896603 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.75 0.8125 0.8 0.73333333 0.93333333 0.86666667 0.6 0.8 0.73333333 0.73333333] mean value: 0.77625 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.75 0.82352941 0.76923077 0.75 0.92307692 0.85714286 0.625 0.76923077 0.77777778 0.8 ] mean value: 0.7844988508223802 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 0.77777778 0.83333333 0.66666667 1. 0.85714286 0.625 1. 0.7 0.66666667] mean value: 0.7876587301587301 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 0.875 0.71428571 0.85714286 0.85714286 0.85714286 0.625 0.625 0.875 1. ] mean value: 0.8035714285714286 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.75 0.8125 0.79464286 0.74107143 0.92857143 0.86607143 0.59821429 0.8125 0.72321429 0.71428571] mean value: 0.7741071428571429 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.6 0.7 0.625 0.6 0.85714286 0.75 0.45454545 0.625 0.63636364 0.66666667] mean value: 0.6514718614718614 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.73 Accuracy on Blind test: 0.89 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01000309 0.0091064 0.00996566 0.00957346 0.01001763 0.01011038 0.00918674 0.009022 0.00903392 0.00947905] mean value: 0.009549832344055176 key: score_time value: [0.00933886 0.00877929 0.00922394 0.00939322 0.00928879 0.00938463 0.00908303 0.00888348 0.00878763 0.0089674 ] mean value: 0.009113025665283204 key: test_mcc value: [ 0.13483997 0.62994079 0.19642857 0.46428571 0.34247476 0.46428571 0.05455447 -0.19642857 0.19642857 0.20044593] mean value: 0.24872559245454634 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.5625 0.8125 0.6 0.73333333 0.66666667 0.73333333 0.53333333 0.4 0.6 0.6 ] mean value: 0.6241666666666666 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.63157895 0.82352941 0.57142857 0.71428571 0.54545455 0.71428571 0.58823529 0.4 0.625 0.7 ] mean value: 0.6313798198705319 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.54545455 0.77777778 0.57142857 0.71428571 0.75 0.71428571 0.55555556 0.42857143 0.625 0.58333333] mean value: 0.626569264069264 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 0.875 0.57142857 0.71428571 0.42857143 0.71428571 0.625 0.375 0.625 0.875 ] mean value: 0.6553571428571429 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.5625 0.8125 0.59821429 0.73214286 0.65178571 0.73214286 0.52678571 0.40178571 0.59821429 0.58035714] mean value: 0.6196428571428572 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.46153846 0.7 0.4 0.55555556 0.375 0.55555556 0.41666667 0.25 0.45454545 0.53846154] mean value: 0.4707323232323232 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.41 Accuracy on Blind test: 0.72 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.18459654 1.16294837 1.18245697 1.17819452 1.17918324 1.1352222 1.12796831 1.13954997 1.12182617 1.11643767] mean value: 1.152838397026062 key: score_time value: [0.09779596 0.09762621 0.09741259 0.09748626 0.08932686 0.0909512 0.09467626 0.09128833 0.15364361 0.09373999] mean value: 0.10039472579956055 key: test_mcc value: [0.77459667 1. 0.73214286 0.87287156 0.87287156 0.875 0.64465837 0.875 0.73214286 1. ] mean value: 0.837928387663544 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.875 1. 0.86666667 0.93333333 0.93333333 0.93333333 0.8 0.93333333 0.86666667 1. ] mean value: 0.9141666666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 1. 0.85714286 0.92307692 0.92307692 0.93333333 0.84210526 0.93333333 0.875 1. ] mean value: 0.9144211490264121 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.85714286 1. 1. 0.875 0.72727273 1. 0.875 1. ] mean value: 0.9334415584415584 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 1. 0.85714286 0.85714286 0.85714286 1. 1. 0.875 0.875 1. ] mean value: 0.9071428571428571 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.875 1. 0.86607143 0.92857143 0.92857143 0.9375 0.78571429 0.9375 0.86607143 1. ] mean value: 0.9125 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( mean value: 1.0 key: test_jcc value: [0.75 1. 0.75 0.85714286 0.85714286 0.875 0.72727273 0.875 0.77777778 1. ] mean value: 0.8469336219336219 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.8 Accuracy on Blind test: 0.92 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.79976606 0.85018516 0.8501091 0.89193416 0.93633461 0.90565896 0.87436795 0.92361069 0.89648938 0.87731433] mean value: 0.8805770397186279 key: score_time value: [0.23642707 0.20861435 0.2369709 0.24829936 0.22590995 0.23644233 0.15728378 0.22497702 0.20615268 0.19786143] mean value: 0.2178938865661621 key: test_mcc value: [0.8819171 0.8819171 0.73214286 0.87287156 0.87287156 1. 0.32732684 0.66143783 0.875 0.87287156] mean value: 0.7978356410471296 key: train_mcc value: [0.97058824 0.95598573 0.95630861 0.95713391 0.97080136 0.97080136 0.97120941 0.97080136 0.95629932 0.95629932] mean value: 0.9636228629898831 key: test_accuracy value: [0.9375 0.9375 0.86666667 0.93333333 0.93333333 1. 0.66666667 0.8 0.93333333 0.93333333] mean value: 0.8941666666666667 key: train_accuracy value: [0.98529412 0.97794118 0.97810219 0.97810219 0.98540146 0.98540146 0.98540146 0.98540146 0.97810219 0.97810219] mean value: 0.9817249892657793 key: test_fscore value: [0.93333333 0.93333333 0.85714286 0.92307692 0.92307692 1. 0.70588235 0.76923077 0.93333333 0.94117647] mean value: 0.8919586296056884 key: train_fscore value: [0.98529412 0.97777778 0.97810219 0.97777778 0.98550725 0.98550725 0.98507463 0.98529412 0.97777778 0.97777778] mean value: 0.9815890655805546 key: test_precision value: [1. 1. 0.85714286 1. 1. 1. 0.66666667 1. 1. 0.88888889] mean value: 0.9412698412698413 key: train_precision value: [0.98529412 0.98507463 0.98529412 1. 0.98550725 0.98550725 1. 0.98529412 0.98507463 0.98507463] mean value: 0.9882120726291814 key: test_recall value: [0.875 0.875 0.85714286 0.85714286 0.85714286 1. 0.75 0.625 0.875 1. ] mean value: 0.8571428571428571 key: train_recall value: [0.98529412 0.97058824 0.97101449 0.95652174 0.98550725 0.98550725 0.97058824 0.98529412 0.97058824 0.97058824] mean value: 0.9751491901108269 key: test_roc_auc value: [0.9375 0.9375 0.86607143 0.92857143 0.92857143 1. 0.66071429 0.8125 0.9375 0.92857143] mean value: 0.89375 key: train_roc_auc value: [0.98529412 0.97794118 0.97815431 0.97826087 0.98540068 0.98540068 0.98529412 0.98540068 0.97804774 0.97804774] mean value: 0.9817242114237 key: test_jcc value: [0.875 0.875 0.75 0.85714286 0.85714286 1. 0.54545455 0.625 0.875 0.88888889] mean value: 0.8148629148629148 key: train_jcc value: [0.97101449 0.95652174 0.95714286 0.95652174 0.97142857 0.97142857 0.97058824 0.97101449 0.95652174 0.95652174] mean value: 0.9638704177323103 MCC on Blind test: 0.85 Accuracy on Blind test: 0.94 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00971198 0.0104022 0.00985646 0.00999689 0.01531601 0.0123229 0.00914288 0.00894403 0.00899482 0.01425886] mean value: 0.010894703865051269 key: score_time value: [0.00989151 0.00964308 0.00926447 0.01156521 0.01512861 0.01090312 0.00868583 0.00870919 0.0092032 0.01466846] mean value: 0.010766267776489258 key: test_mcc value: [ 0.40451992 0.67419986 0.46428571 0.75592895 0.6000992 0.46428571 -0.19642857 0.37796447 0.47245559 0.32732684] mean value: 0.43446376808762277 key: train_mcc value: [0.67911938 0.60616144 0.68986702 0.63862773 0.65701381 0.57996733 0.66746486 0.62437433 0.66581484 0.640228 ] mean value: 0.6448638739267128 key: test_accuracy value: [0.6875 0.8125 0.73333333 0.86666667 0.8 0.73333333 0.4 0.66666667 0.73333333 0.66666667] mean value: 0.71 key: train_accuracy value: [0.83823529 0.80147059 0.83941606 0.81751825 0.82481752 0.78832117 0.83211679 0.81021898 0.83211679 0.81751825] mean value: 0.8201749677973379 key: test_fscore value: [0.61538462 0.76923077 0.71428571 0.83333333 0.76923077 0.71428571 0.4 0.61538462 0.77777778 0.70588235] mean value: 0.6914795661854485 key: train_fscore value: [0.83076923 0.79069767 0.82539683 0.80916031 0.8125 0.77862595 0.82170543 0.796875 0.82442748 0.80314961] mean value: 0.8093307503698478 key: test_precision value: [0.8 1. 0.71428571 1. 0.83333333 0.71428571 0.42857143 0.8 0.7 0.66666667] mean value: 0.7657142857142857 key: train_precision value: [0.87096774 0.83606557 0.9122807 0.85483871 0.88135593 0.82258065 0.86885246 0.85 0.85714286 0.86440678] mean value: 0.8618491400322729 key: test_recall value: [0.5 0.625 0.71428571 0.71428571 0.71428571 0.71428571 0.375 0.5 0.875 0.75 ] mean value: 0.6482142857142857 key: train_recall value: [0.79411765 0.75 0.75362319 0.76811594 0.75362319 0.73913043 0.77941176 0.75 0.79411765 0.75 ] mean value: 0.7632139812446718 key: test_roc_auc value: [0.6875 0.8125 0.73214286 0.85714286 0.79464286 0.73214286 0.40178571 0.67857143 0.72321429 0.66071429] mean value: 0.7080357142857143 key: train_roc_auc value: [0.83823529 0.80147059 0.84004689 0.8178815 0.82534101 0.78868286 0.83173487 0.80978261 0.83184143 0.81702899] mean value: 0.8202046035805627 key: test_jcc value: [0.44444444 0.625 0.55555556 0.71428571 0.625 0.55555556 0.25 0.44444444 0.63636364 0.54545455] mean value: 0.5396103896103897 key: train_jcc value: [0.71052632 0.65384615 0.7027027 0.67948718 0.68421053 0.6375 0.69736842 0.66233766 0.7012987 0.67105263] mean value: 0.6800330294409241 MCC on Blind test: 0.26 Accuracy on Blind test: 0.69 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.14137769 0.04036045 0.04341197 0.04366136 0.0674777 0.04775929 0.04809213 0.04925728 0.04873991 0.04506922] mean value: 0.057520699501037595 key: score_time value: [0.01150417 0.01063967 0.01087403 0.01018643 0.01102757 0.0106461 0.01055002 0.01029372 0.01260114 0.01114511] mean value: 0.01094679832458496 key: test_mcc value: [0.77459667 1. 1. 0.87287156 1. 0.875 0.87287156 1. 0.875 1. ] mean value: 0.9270339791129423 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.875 1. 1. 0.93333333 1. 0.93333333 0.93333333 1. 0.93333333 1. ] mean value: 0.9608333333333333 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 1. 1. 0.92307692 1. 0.93333333 0.94117647 1. 0.93333333 1. ] mean value: 0.9588062917474682 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 1. 1. 1. 0.875 0.88888889 1. 1. 1. ] mean value: 0.9763888888888889 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 1. 1. 0.85714286 1. 1. 1. 1. 0.875 1. ] mean value: 0.9482142857142857 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.875 1. 1. 0.92857143 1. 0.9375 0.92857143 1. 0.9375 1. ] mean value: 0.9607142857142857 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 1. 1. 0.85714286 1. 0.875 0.88888889 1. 0.875 1. ] mean value: 0.9246031746031746 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.88 Accuracy on Blind test: 0.96 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.0207808 0.02300382 0.0433135 0.05539727 0.05441689 0.05075097 0.05279303 0.04780746 0.06620932 0.02348566] mean value: 0.04379587173461914 key: score_time value: [0.01240396 0.02174282 0.02213216 0.02098989 0.0220952 0.01775432 0.01941085 0.02169156 0.01264215 0.02482295] mean value: 0.019568586349487306 key: test_mcc value: [-0.12598816 0.12598816 0.47245559 0.32732684 0.04029115 0.46428571 -0.07142857 0.26189246 0.6000992 0.32732684] mean value: 0.24222492144851507 key: train_mcc value: [1. 0.98540068 1. 0.97080136 0.97080136 0.98550418 1. 0.98550725 1. 0.98550418] mean value: 0.9883519009251238 key: test_accuracy value: [0.4375 0.5625 0.73333333 0.66666667 0.53333333 0.73333333 0.46666667 0.6 0.8 0.66666667] mean value: 0.62 key: train_accuracy value: [1. 0.99264706 1. 0.98540146 0.98540146 0.99270073 1. 0.99270073 1. 0.99270073] mean value: 0.9941552168312581 key: test_fscore value: [0.4 0.58823529 0.66666667 0.61538462 0.36363636 0.71428571 0.5 0.5 0.82352941 0.70588235] mean value: 0.587762041879689 key: train_fscore value: [1. 0.99270073 1. 0.98550725 0.98550725 0.99280576 1. 0.99270073 1. 0.99259259] mean value: 0.9941814300595915 key: test_precision value: [0.42857143 0.55555556 0.8 0.66666667 0.5 0.71428571 0.5 0.75 0.77777778 0.66666667] mean value: 0.6359523809523809 key: train_precision value: [1. 0.98550725 1. 0.98550725 0.98550725 0.98571429 1. 0.98550725 1. 1. ] mean value: 0.9927743271221532 key: test_recall value: [0.375 0.625 0.57142857 0.57142857 0.28571429 0.71428571 0.5 0.375 0.875 0.75 ] mean value: 0.5642857142857143 key: train_recall value: [1. 1. 1. 0.98550725 0.98550725 1. 1. 1. 1. 0.98529412] mean value: 0.9956308610400683 key: test_roc_auc value: [0.4375 0.5625 0.72321429 0.66071429 0.51785714 0.73214286 0.46428571 0.61607143 0.79464286 0.66071429] mean value: 0.6169642857142857 key: train_roc_auc value: [1. 0.99264706 1. 0.98540068 0.98540068 0.99264706 1. 0.99275362 1. 0.99264706] mean value: 0.9941496163682865 key: test_jcc value: [0.25 0.41666667 0.5 0.44444444 0.22222222 0.55555556 0.33333333 0.33333333 0.7 0.54545455] mean value: 0.4301010101010101 key: train_jcc value: [1. 0.98550725 1. 0.97142857 0.97142857 0.98571429 1. 0.98550725 1. 0.98529412] mean value: 0.988488003897211 MCC on Blind test: 0.61 Accuracy on Blind test: 0.86 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02475095 0.00916529 0.00877476 0.00865769 0.01192904 0.00877118 0.00866795 0.00887179 0.01024485 0.00899529] mean value: 0.010882878303527832 key: score_time value: [0.00913358 0.00896764 0.00864172 0.00901842 0.00979638 0.00859523 0.00856853 0.00883698 0.00979352 0.01023078] mean value: 0.00915827751159668 key: test_mcc value: [0.67419986 0.75 0.73214286 0.21821789 0.87287156 0.75592895 0.19642857 0.66143783 0.32732684 0.34247476] mean value: 0.553102911106401 key: train_mcc value: [0.67676337 0.63406934 0.678815 0.63512361 0.73758262 0.66432225 0.79590547 0.69352089 0.73721228 0.62060153] mean value: 0.6873916366987387 key: test_accuracy value: [0.8125 0.875 0.86666667 0.6 0.93333333 0.86666667 0.6 0.8 0.66666667 0.66666667] mean value: 0.76875 key: train_accuracy value: [0.83823529 0.81617647 0.83941606 0.81751825 0.86861314 0.83211679 0.89781022 0.84671533 0.86861314 0.81021898] mean value: 0.8435433662516101 key: test_fscore value: [0.76923077 0.875 0.85714286 0.625 0.92307692 0.83333333 0.625 0.76923077 0.70588235 0.73684211] mean value: 0.7719739110218986 key: train_fscore value: [0.84057971 0.82269504 0.84057971 0.81751825 0.86764706 0.83211679 0.89552239 0.84671533 0.86764706 0.80597015] mean value: 0.8436991475674843 key: test_precision value: [1. 0.875 0.85714286 0.55555556 1. 1. 0.625 1. 0.66666667 0.63636364] mean value: 0.8215728715728716 key: train_precision value: [0.82857143 0.79452055 0.84057971 0.82352941 0.88059701 0.83823529 0.90909091 0.84057971 0.86764706 0.81818182] mean value: 0.8441532903710471 key: test_recall value: [0.625 0.875 0.85714286 0.71428571 0.85714286 0.71428571 0.625 0.625 0.75 0.875 ] mean value: 0.7517857142857143 key: train_recall value: [0.85294118 0.85294118 0.84057971 0.8115942 0.85507246 0.82608696 0.88235294 0.85294118 0.86764706 0.79411765] mean value: 0.8436274509803922 key: test_roc_auc value: [0.8125 0.875 0.86607143 0.60714286 0.92857143 0.85714286 0.59821429 0.8125 0.66071429 0.65178571] mean value: 0.7669642857142858 key: train_roc_auc value: [0.83823529 0.81617647 0.8394075 0.81756181 0.8687127 0.83216113 0.89769821 0.84676044 0.86860614 0.8101023 ] mean value: 0.8435421994884911 key: test_jcc value: [0.625 0.77777778 0.75 0.45454545 0.85714286 0.71428571 0.45454545 0.625 0.54545455 0.58333333] mean value: 0.6387085137085137 key: train_jcc value: [0.725 0.69879518 0.725 0.69135802 0.76623377 0.7125 0.81081081 0.73417722 0.76623377 0.675 ] mean value: 0.7305108763882466 MCC on Blind test: 0.56 Accuracy on Blind test: 0.81 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.0145216 0.01501608 0.01437402 0.01481843 0.01543903 0.01425052 0.01434255 0.01563168 0.014395 0.01469946] mean value: 0.014748835563659668 key: score_time value: [0.01158118 0.01175547 0.01154518 0.01160622 0.01153302 0.01158404 0.01175642 0.01160836 0.01161551 0.01179576] mean value: 0.011638116836547852 key: test_mcc value: [0.8819171 0.67419986 0.75592895 0.73214286 0.73214286 0.53452248 0.6000992 0.56407607 0.60714286 0.53452248] mean value: 0.6616694724214908 key: train_mcc value: [0.94280904 0.8623165 0.8130258 0.95713391 0.95629932 0.88938138 0.92944673 0.81250852 0.88920184 0.9158731 ] mean value: 0.8967996137276029 key: test_accuracy value: [0.9375 0.8125 0.86666667 0.86666667 0.86666667 0.73333333 0.8 0.73333333 0.8 0.73333333] mean value: 0.815 key: train_accuracy value: [0.97058824 0.92647059 0.89781022 0.97810219 0.97810219 0.94160584 0.96350365 0.89781022 0.94160584 0.95620438] mean value: 0.9451803349076857 key: test_fscore value: [0.93333333 0.76923077 0.83333333 0.85714286 0.85714286 0.6 0.82352941 0.66666667 0.8 0.8 ] mean value: 0.7940379228614522 key: train_fscore value: [0.96969697 0.92063492 0.88709677 0.97777778 0.97841727 0.93846154 0.96183206 0.8852459 0.9375 0.95384615] mean value: 0.9410509363506006 key: test_precision value: [1. 1. 1. 0.85714286 0.85714286 1. 0.77777778 1. 0.85714286 0.66666667] mean value: 0.9015873015873016 key: train_precision value: [1. 1. 1. 1. 0.97142857 1. 1. 1. 1. 1. ] mean value: 0.9971428571428571 key: test_recall value: [0.875 0.625 0.71428571 0.85714286 0.85714286 0.42857143 0.875 0.5 0.75 1. ] mean value: 0.7482142857142857 key: train_recall value: [0.94117647 0.85294118 0.79710145 0.95652174 0.98550725 0.88405797 0.92647059 0.79411765 0.88235294 0.91176471] mean value: 0.8932011935208866 key: test_roc_auc value: [0.9375 0.8125 0.85714286 0.86607143 0.86607143 0.71428571 0.79464286 0.75 0.80357143 0.71428571] mean value: 0.8116071428571429 key: train_roc_auc value: [0.97058824 0.92647059 0.89855072 0.97826087 0.97804774 0.94202899 0.96323529 0.89705882 0.94117647 0.95588235] mean value: 0.9451300085251492 key: test_jcc value: [0.875 0.625 0.71428571 0.75 0.75 0.42857143 0.7 0.5 0.66666667 0.66666667] mean value: 0.6676190476190476 key: train_jcc value: [0.94117647 0.85294118 0.79710145 0.95652174 0.95774648 0.88405797 0.92647059 0.79411765 0.88235294 0.91176471] mean value: 0.8904251167705294 MCC on Blind test: 0.81 Accuracy on Blind test: 0.93 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01324773 0.01418781 0.01375175 0.01332355 0.01351047 0.01325345 0.01305151 0.01286387 0.01371384 0.01300216] mean value: 0.013390612602233887 key: score_time value: [0.0118475 0.01155829 0.01162076 0.01165366 0.01165652 0.01157427 0.01155686 0.01157951 0.01150799 0.01157808] mean value: 0.0116133451461792 key: test_mcc value: [0.37796447 0.48038446 0.75592895 0.64465837 0.49099025 0.64465837 0.6000992 0.76376262 0.73214286 0.64465837] mean value: 0.6135247918252648 key: train_mcc value: [0.70321085 0.6799747 0.92951942 0.78854812 0.81433714 0.82543222 0.8978896 0.89869927 0.88938138 0.88654289] mean value: 0.8313535585121352 key: test_accuracy value: [0.625 0.6875 0.86666667 0.8 0.73333333 0.8 0.8 0.86666667 0.86666667 0.8 ] mean value: 0.7845833333333334 key: train_accuracy value: [0.83088235 0.81617647 0.96350365 0.88321168 0.90510949 0.90510949 0.94890511 0.94890511 0.94160584 0.94160584] mean value: 0.9085015027908974 key: test_fscore value: [0.4 0.54545455 0.83333333 0.72727273 0.75 0.72727273 0.82352941 0.85714286 0.875 0.84210526] mean value: 0.7381110865398791 key: train_fscore value: [0.79646018 0.77477477 0.96240602 0.86885246 0.91034483 0.896 0.94814815 0.94964029 0.94444444 0.93846154] mean value: 0.8989532672230035 key: test_precision value: [1. 1. 1. 1. 0.66666667 1. 0.77777778 1. 0.875 0.72727273] mean value: 0.9046717171717171 key: train_precision value: [1. 1. 1. 1. 0.86842105 1. 0.95522388 0.92957746 0.89473684 0.98387097] mean value: 0.9631830207864525 key: test_recall value: [0.25 0.375 0.71428571 0.57142857 0.85714286 0.57142857 0.875 0.75 0.875 1. ] mean value: 0.6839285714285714 key: train_recall value: [0.66176471 0.63235294 0.92753623 0.76811594 0.95652174 0.8115942 0.94117647 0.97058824 1. 0.89705882] mean value: 0.8566709292412618 key: test_roc_auc value: [0.625 0.6875 0.85714286 0.78571429 0.74107143 0.78571429 0.79464286 0.875 0.86607143 0.78571429] mean value: 0.7803571428571429 key: train_roc_auc value: [0.83088235 0.81617647 0.96376812 0.88405797 0.90473146 0.9057971 0.9488491 0.94906223 0.94202899 0.94128303] mean value: 0.9086636828644501 key: test_jcc value: [0.25 0.375 0.71428571 0.57142857 0.6 0.57142857 0.7 0.75 0.77777778 0.72727273] mean value: 0.6037193362193363 key: train_jcc value: [0.66176471 0.63235294 0.92753623 0.76811594 0.83544304 0.8115942 0.90140845 0.90410959 0.89473684 0.88405797] mean value: 0.8221119914710179 MCC on Blind test: 0.57 Accuracy on Blind test: 0.83 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.11366677 0.0940814 0.10225201 0.10271406 0.09806108 0.09513283 0.09545851 0.09452772 0.09964037 0.09940076] mean value: 0.09949355125427246 key: score_time value: [0.01469493 0.01464295 0.02262974 0.01577759 0.01471615 0.0148375 0.0147779 0.01488066 0.01598573 0.01483297] mean value: 0.01577761173248291 key: test_mcc value: [0.77459667 1. 0.76376262 0.87287156 1. 1. 0.87287156 1. 0.875 1. ] mean value: 0.9159102406955395 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.875 1. 0.86666667 0.93333333 1. 1. 0.93333333 1. 0.93333333 1. ] mean value: 0.9541666666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 1. 0.875 0.92307692 1. 1. 0.94117647 1. 0.93333333 1. ] mean value: 0.9529729584141349 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.77777778 1. 1. 1. 0.88888889 1. 1. 1. ] mean value: 0.9666666666666667 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 1. 1. 0.85714286 1. 1. 1. 1. 0.875 1. ] mean value: 0.9482142857142857 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.875 1. 0.875 0.92857143 1. 1. 0.92857143 1. 0.9375 1. ] mean value: 0.9544642857142858 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 1. 0.77777778 0.85714286 1. 1. 0.88888889 1. 0.875 1. ] mean value: 0.9148809523809524 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.88 Accuracy on Blind test: 0.96 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.0341208 0.03071618 0.06870103 0.02894068 0.03026152 0.03227735 0.03287911 0.0358839 0.04638481 0.04289818] mean value: 0.038306355476379395 key: score_time value: [0.02494836 0.02263951 0.02276754 0.01839662 0.01910567 0.02346301 0.02387643 0.02339864 0.03518057 0.03576827] mean value: 0.024954462051391603 key: test_mcc value: [0.77459667 1. 1. 0.87287156 1. 1. 0.87287156 1. 0.875 1. ] mean value: 0.9395339791129422 key: train_mcc value: [0.98540068 0.98540068 0.98550725 0.98550725 0.98550725 0.98550725 0.97120941 0.98550418 1. 0.98550418] mean value: 0.9855048108412058 key: test_accuracy value: [0.875 1. 1. 0.93333333 1. 1. 0.93333333 1. 0.93333333 1. ] mean value: 0.9675 key: train_accuracy value: [0.99264706 0.99264706 0.99270073 0.99270073 0.99270073 0.99270073 0.98540146 0.99270073 1. 0.99270073] mean value: 0.9926899957063118 key: test_fscore value: [0.85714286 1. 1. 0.92307692 1. 1. 0.94117647 1. 0.93333333 1. ] mean value: 0.9654729584141348 key: train_fscore value: [0.99259259 0.99259259 0.99270073 0.99270073 0.99270073 0.99270073 0.98507463 0.99259259 1. 0.99259259] mean value: 0.9926247916944072 key: test_precision value: [1. 1. 1. 1. 1. 1. 0.88888889 1. 1. 1. ] mean value: 0.9888888888888889 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 1. 1. 0.85714286 1. 1. 1. 1. 0.875 1. ] mean value: 0.9482142857142857 key: train_recall value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725 0.97058824 0.98529412 1. 0.98529412] mean value: 0.98537936913896 key: test_roc_auc value: [0.875 1. 1. 0.92857143 1. 1. 0.92857143 1. 0.9375 1. ] mean value: 0.9669642857142857 key: train_roc_auc value: [0.99264706 0.99264706 0.99275362 0.99275362 0.99275362 0.99275362 0.98529412 0.99264706 1. 0.99264706] mean value: 0.9926896845694799 key: test_jcc value: [0.75 1. 1. 0.85714286 1. 1. 0.88888889 1. 0.875 1. ] mean value: 0.9371031746031746 key: train_jcc value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725 0.97058824 0.98529412 1. 0.98529412] mean value: 0.98537936913896 MCC on Blind test: 0.88 Accuracy on Blind test: 0.96 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.03243661 0.04290581 0.07457685 0.05478549 0.05016804 0.04229784 0.05081439 0.05759549 0.05455852 0.05034065] mean value: 0.05104796886444092 key: score_time value: [0.01870847 0.02562094 0.02191496 0.0251255 0.03725863 0.0223453 0.02001977 0.02464914 0.01784778 0.02232647] mean value: 0.023581695556640626 key: test_mcc value: [0.51639778 0.51639778 0.47245559 0.21821789 0.64465837 0.6000992 0.07142857 0.46770717 0.32732684 0.60714286] mean value: 0.4441832047127614 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.75 0.75 0.73333333 0.6 0.8 0.8 0.53333333 0.66666667 0.66666667 0.8 ] mean value: 0.71 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.71428571 0.71428571 0.66666667 0.625 0.72727273 0.76923077 0.53333333 0.54545455 0.70588235 0.8 ] mean value: 0.6801411823470647 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.83333333 0.83333333 0.8 0.55555556 1. 0.83333333 0.57142857 1. 0.66666667 0.85714286] mean value: 0.795079365079365 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.625 0.625 0.57142857 0.71428571 0.57142857 0.71428571 0.5 0.375 0.75 0.75 ] mean value: 0.6196428571428572 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.75 0.75 0.72321429 0.60714286 0.78571429 0.79464286 0.53571429 0.6875 0.66071429 0.80357143] mean value: 0.7098214285714286 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.55555556 0.55555556 0.5 0.45454545 0.57142857 0.625 0.36363636 0.375 0.54545455 0.66666667] mean value: 0.5212842712842712 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.3 Accuracy on Blind test: 0.68 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.26349664 0.30767131 0.26317835 0.27116275 0.31355691 0.28043199 0.25030661 0.25096774 0.2339437 0.26458788] mean value: 0.26993038654327395 key: score_time value: [0.01109076 0.01076746 0.00925684 0.01056147 0.01435041 0.01452732 0.009161 0.00908542 0.00923038 0.00994611] mean value: 0.010797715187072754 key: test_mcc value: [0.77459667 1. 1. 0.87287156 1. 0.87287156 0.75592895 1. 0.875 1. ] mean value: 0.9151268737147877 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.875 1. 1. 0.93333333 1. 0.93333333 0.86666667 1. 0.93333333 1. ] mean value: 0.9541666666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 1. 1. 0.92307692 1. 0.92307692 0.88888889 1. 0.93333333 1. ] mean value: 0.9525518925518925 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 1. 1. 1. 1. 0.8 1. 1. 1. ] mean value: 0.98 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 1. 1. 0.85714286 1. 0.85714286 1. 1. 0.875 1. ] mean value: 0.9339285714285714 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.875 1. 1. 0.92857143 1. 0.92857143 0.85714286 1. 0.9375 1. ] mean value: 0.9526785714285715 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 1. 1. 0.85714286 1. 0.85714286 0.8 1. 0.875 1. ] mean value: 0.9139285714285714 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.88 Accuracy on Blind test: 0.96 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.01642203 0.01763225 0.02696109 0.01796865 0.01799726 0.01788306 0.01834369 0.02878118 0.01888704 0.02675176] mean value: 0.02076280117034912 key: score_time value: [0.01238465 0.01226735 0.01235175 0.01354599 0.01394749 0.01234221 0.01371408 0.01333451 0.01352239 0.01251984] mean value: 0.01299302577972412 key: test_mcc value: [ 0.12598816 0. -0.05455447 -0.13363062 -0.64465837 0.04029115 0.49099025 -0.19642857 0.19642857 0.33928571] mean value: 0.016371180845219407 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.5625 0.5 0.46666667 0.46666667 0.2 0.53333333 0.73333333 0.4 0.6 0.66666667] mean value: 0.5129166666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.58823529 0.5 0.5 0.2 0.33333333 0.36363636 0.71428571 0.4 0.625 0.66666667] mean value: 0.4891157372039725 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.55555556 0.5 0.44444444 0.33333333 0.27272727 0.5 0.83333333 0.42857143 0.625 0.71428571] mean value: 0.5207251082251082 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.625 0.5 0.57142857 0.14285714 0.42857143 0.28571429 0.625 0.375 0.625 0.625 ] mean value: 0.48035714285714287 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.5625 0.5 0.47321429 0.44642857 0.21428571 0.51785714 0.74107143 0.40178571 0.59821429 0.66964286] mean value: 0.5125 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.41666667 0.33333333 0.33333333 0.11111111 0.2 0.22222222 0.55555556 0.25 0.45454545 0.5 ] mean value: 0.3376767676767677 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.0 Accuracy on Blind test: 0.49 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02977848 0.03747535 0.04021692 0.03384948 0.03854942 0.03706765 0.03542995 0.03389931 0.03391671 0.03379488] mean value: 0.035397815704345706 key: score_time value: [0.02066422 0.02376986 0.02077508 0.02034974 0.02166724 0.02019167 0.02067709 0.02275515 0.02010059 0.02273393] mean value: 0.02136845588684082 key: test_mcc value: [0.62994079 1. 0.87287156 0.76376262 0.87287156 0.87287156 0.64465837 0.66143783 0.73214286 0.53452248] mean value: 0.7585079626960751 key: train_mcc value: [0.97058824 0.97058824 0.97080136 0.97080136 0.97080136 0.97080136 0.98550418 0.97080136 0.97080136 0.97080136] mean value: 0.9722290198043756 key: test_accuracy value: [0.8125 1. 0.93333333 0.86666667 0.93333333 0.93333333 0.8 0.8 0.86666667 0.73333333] mean value: 0.8679166666666667 key: train_accuracy value: [0.98529412 0.98529412 0.98540146 0.98540146 0.98540146 0.98540146 0.99270073 0.98540146 0.98540146 0.98540146] mean value: 0.9861099184199227 key: test_fscore value: [0.8 1. 0.92307692 0.875 0.92307692 0.92307692 0.84210526 0.76923077 0.875 0.8 ] mean value: 0.8730566801619433 key: train_fscore value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725 0.99259259 0.98529412 0.98529412 0.98529412] mean value: 0.9861092166335134 key: test_precision value: [0.85714286 1. 1. 0.77777778 1. 1. 0.72727273 1. 0.875 0.66666667] mean value: 0.8903860028860029 key: train_precision value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725 1. 0.98529412 0.98529412 0.98529412] mean value: 0.9868499573742541 key: test_recall value: [0.75 1. 0.85714286 1. 0.85714286 0.85714286 1. 0.625 0.875 1. ] mean value: 0.8821428571428571 key: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:168: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:171: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) train_recall value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725 0.98529412 0.98529412 0.98529412 0.98529412] mean value: 0.98537936913896 key: test_roc_auc value: [0.8125 1. 0.92857143 0.875 0.92857143 0.92857143 0.78571429 0.8125 0.86607143 0.71428571] mean value: 0.8651785714285715 key: train_roc_auc value: [0.98529412 0.98529412 0.98540068 0.98540068 0.98540068 0.98540068 0.99264706 0.98540068 0.98540068 0.98540068] mean value: 0.9861040068201194 key: test_jcc value: [0.66666667 1. 0.85714286 0.77777778 0.85714286 0.85714286 0.72727273 0.625 0.77777778 0.66666667] mean value: 0.7812590187590187 key: train_jcc value: [0.97101449 0.97101449 0.97142857 0.97142857 0.97142857 0.97142857 0.98529412 0.97101449 0.97101449 0.97101449] mean value: 0.9726080867129461 MCC on Blind test: 0.76 Accuracy on Blind test: 0.91 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.20923996 0.30740142 0.23965693 0.30709243 0.33208108 0.24986792 0.23055601 0.22767615 0.24073982 0.21366882] mean value: 0.2557980537414551 key: score_time value: [0.02369094 0.02021146 0.02024269 0.02145529 0.02396369 0.02035141 0.02180147 0.02421188 0.0203321 0.02221036] mean value: 0.021847128868103027 key: test_mcc value: [0.62994079 1. 0.87287156 0.76376262 0.87287156 0.87287156 0.64465837 0.66143783 0.73214286 0.53452248] mean value: 0.7585079626960751 key: train_mcc value: [0.97058824 0.97058824 0.97080136 0.97080136 0.97080136 0.97080136 0.98550418 0.97080136 0.97080136 0.97080136] mean value: 0.9722290198043756 key: test_accuracy value: [0.8125 1. 0.93333333 0.86666667 0.93333333 0.93333333 0.8 0.8 0.86666667 0.73333333] mean value: 0.8679166666666667 key: train_accuracy value: [0.98529412 0.98529412 0.98540146 0.98540146 0.98540146 0.98540146 0.99270073 0.98540146 0.98540146 0.98540146] mean value: 0.9861099184199227 key: test_fscore value: [0.8 1. 0.92307692 0.875 0.92307692 0.92307692 0.84210526 0.76923077 0.875 0.8 ] mean value: 0.8730566801619433 key: train_fscore value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725 0.99259259 0.98529412 0.98529412 0.98529412] mean value: 0.9861092166335134 key: test_precision value: [0.85714286 1. 1. 0.77777778 1. 1. 0.72727273 1. 0.875 0.66666667] mean value: 0.8903860028860029 key: train_precision value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725 1. 0.98529412 0.98529412 0.98529412] mean value: 0.9868499573742541 key: test_recall value: [0.75 1. 0.85714286 1. 0.85714286 0.85714286 1. 0.625 0.875 1. ] mean value: 0.8821428571428571 key: train_recall value: [0.98529412 0.98529412 0.98550725 0.98550725 0.98550725 0.98550725 0.98529412 0.98529412 0.98529412 0.98529412] mean value: 0.98537936913896 key: test_roc_auc value: [0.8125 1. 0.92857143 0.875 0.92857143 0.92857143 0.78571429 0.8125 0.86607143 0.71428571] mean value: 0.8651785714285715 key: train_roc_auc value: [0.98529412 0.98529412 0.98540068 0.98540068 0.98540068 0.98540068 0.99264706 0.98540068 0.98540068 0.98540068] mean value: 0.9861040068201194 key: test_jcc value: [0.66666667 1. 0.85714286 0.77777778 0.85714286 0.85714286 0.72727273 0.625 0.77777778 0.66666667] mean value: 0.7812590187590187 key: train_jcc value: [0.97101449 0.97101449 0.97142857 0.97142857 0.97142857 0.97142857 0.98529412 0.97101449 0.97101449 0.97101449] mean value: 0.9726080867129461 MCC on Blind test: 0.76 Accuracy on Blind test: 0.91 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.04199767 0.03794289 0.03837681 0.05303359 0.15559244 0.13856936 0.05898094 0.04978466 0.07463431 0.04460335] mean value: 0.06935160160064698 key: score_time value: [0.01235557 0.01329517 0.01337838 0.01518774 0.02457047 0.03380871 0.0172708 0.01403213 0.02101779 0.01378417] mean value: 0.017870092391967775 key: test_mcc value: [0.9321832 0.93202124 0.8951918 0.82490815 0.71611487 0.78772636 0.89342711 0.89342711 0.78772636 0.80439967] mean value: 0.8467125880292498 key: train_mcc value: [0.89746503 0.90927764 0.89754406 0.90138653 0.90163769 0.88213591 0.93313595 0.90551181 0.92125984 0.92520402] mean value: 0.9074558497858128 key: test_accuracy value: [0.96491228 0.96491228 0.94736842 0.9122807 0.85714286 0.89285714 0.94642857 0.94642857 0.89285714 0.89285714] mean value: 0.9218045112781955 key: train_accuracy value: [0.94871795 0.95463511 0.94871795 0.95069034 0.9507874 0.94094488 0.96653543 0.95275591 0.96062992 0.96259843] mean value: 0.9537013309726816 key: test_fscore value: [0.96551724 0.96296296 0.94915254 0.91525424 0.86206897 0.89655172 0.94736842 0.94736842 0.88888889 0.90322581] mean value: 0.9238359211104228 key: train_fscore value: [0.9486166 0.95463511 0.94820717 0.95049505 0.95107632 0.94163424 0.9667319 0.95275591 0.96062992 0.96252465] mean value: 0.9537306872118687 key: test_precision value: [0.93333333 1. 0.93333333 0.9 0.83333333 0.86666667 0.93103448 0.93103448 0.92307692 0.82352941] mean value: 0.9075341967025538 key: train_precision value: [0.95238095 0.95652174 0.95582329 0.95238095 0.94552529 0.93076923 0.96108949 0.95275591 0.96062992 0.96442688] mean value: 0.9532303658068488 key: test_recall value: [1. 0.92857143 0.96551724 0.93103448 0.89285714 0.92857143 0.96428571 0.96428571 0.85714286 1. ] mean value: 0.9432266009852217 key: train_recall value: [0.94488189 0.95275591 0.94071146 0.9486166 0.95669291 0.95275591 0.97244094 0.95275591 0.96062992 0.96062992] mean value: 0.9542871370327721 key: test_roc_auc value: [0.96551724 0.96428571 0.94704433 0.91194581 0.85714286 0.89285714 0.94642857 0.94642857 0.89285714 0.89285714] mean value: 0.9217364532019705 key: train_roc_auc value: [0.94872553 0.95463882 0.94870219 0.95068625 0.9507874 0.94094488 0.96653543 0.95275591 0.96062992 0.96259843] mean value: 0.9537004761756559 key: test_jcc value: [0.93333333 0.92857143 0.90322581 0.84375 0.75757576 0.8125 0.9 0.9 0.8 0.82352941] mean value: 0.8602485737696839 key: train_jcc value: [0.90225564 0.91320755 0.90151515 0.90566038 0.90671642 0.88970588 0.93560606 0.90977444 0.92424242 0.92775665] mean value: 0.9116440590335693 MCC on Blind test: 0.69 Accuracy on Blind test: 0.88 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.9814384 1.1654892 1.1827333 1.37071514 1.02586007 1.44153547 1.04063892 1.14810324 0.88139033 0.93454194] mean value: 1.1172446012496948 key: score_time value: [0.01375628 0.02266359 0.02546477 0.01354003 0.01353526 0.01361775 0.01248193 0.0207324 0.01353216 0.01379561] mean value: 0.01631197929382324 key: test_mcc value: [0.86851042 0.8951918 0.93202124 0.89952865 0.82195294 0.85714286 0.93094934 0.96490128 0.85933785 0.93094934] mean value: 0.8960485710881759 key: train_mcc value: [0.99211042 0.98028384 0.98817342 0.98425172 0.98032256 0.99212598 0.99212598 1. 0.99212598 0.98819663] mean value: 0.9889716545512739 key: test_accuracy value: [0.92982456 0.94736842 0.96491228 0.94736842 0.91071429 0.92857143 0.96428571 0.98214286 0.92857143 0.96428571] mean value: 0.9468045112781955 key: train_accuracy value: [0.99605523 0.99013807 0.99408284 0.99211045 0.99015748 0.99606299 0.99606299 1. 0.99606299 0.99409449] mean value: 0.9944827532653093 key: test_fscore value: [0.93333333 0.94545455 0.96666667 0.95081967 0.9122807 0.92857143 0.96551724 0.98245614 0.93103448 0.96551724] mean value: 0.9481651453779626 key: train_fscore value: [0.99606299 0.99013807 0.99408284 0.99212598 0.99013807 0.99606299 0.99606299 1. 0.99606299 0.99410609] mean value: 0.9944843017488161 key: test_precision value: [0.875 0.96296296 0.93548387 0.90625 0.89655172 0.92857143 0.93333333 0.96551724 0.9 0.93333333] mean value: 0.9237003894686041 key: train_precision value: [0.99606299 0.99209486 0.99212598 0.98823529 0.99209486 0.99606299 0.99606299 1. 0.99606299 0.99215686] mean value: 0.9940959832938809 key: test_recall value: [1. 0.92857143 1. 1. 0.92857143 0.92857143 1. 1. 0.96428571 1. ] mean value: 0.975 key: train_recall value: [0.99606299 0.98818898 0.99604743 0.99604743 0.98818898 0.99606299 0.99606299 1. 0.99606299 0.99606299] mean value: 0.9948787775045906 key: test_roc_auc value: [0.93103448 0.94704433 0.96428571 0.94642857 0.91071429 0.92857143 0.96428571 0.98214286 0.92857143 0.96428571] mean value: 0.9467364532019705 key: train_roc_auc value: [0.99605521 0.99014192 0.99408671 0.9921182 0.99015748 0.99606299 0.99606299 1. 0.99606299 0.99409449] mean value: 0.9944842986523917 key: test_jcc value: [0.875 0.89655172 0.93548387 0.90625 0.83870968 0.86666667 0.93333333 0.96551724 0.87096774 0.93333333] mean value: 0.9021813589173155 key: train_jcc value: [0.99215686 0.98046875 0.98823529 0.984375 0.98046875 0.99215686 0.99215686 1. 0.99215686 0.98828125] mean value: 0.9890456495098039 MCC on Blind test: 0.81 Accuracy on Blind test: 0.93 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.0210526 0.01111865 0.01016021 0.01017928 0.00997281 0.01037288 0.01188016 0.01034904 0.0100739 0.01008177] mean value: 0.011524128913879394 key: score_time value: [0.01222992 0.00975513 0.00912452 0.00903606 0.00887966 0.00971317 0.01039076 0.00900006 0.00902438 0.00893545] mean value: 0.009608912467956542 key: test_mcc value: [0.75492611 0.69397486 0.75462449 0.58076493 0.50128041 0.75047877 0.75047877 0.57142857 0.68250015 0.72168784] mean value: 0.6762144912656113 key: train_mcc value: [0.75216564 0.74385846 0.73986336 0.73178133 0.76786532 0.72461164 0.69640469 0.7442387 0.79728008 0.70868339] mean value: 0.740675262273998 key: test_accuracy value: [0.87719298 0.84210526 0.87719298 0.78947368 0.75 0.875 0.875 0.78571429 0.83928571 0.85714286] mean value: 0.8368107769423558 key: train_accuracy value: [0.87573964 0.87179487 0.86982249 0.86587771 0.88385827 0.86220472 0.84448819 0.87204724 0.8976378 0.85433071] mean value: 0.8697801643137804 key: test_fscore value: [0.87719298 0.82352941 0.88135593 0.78571429 0.75862069 0.87719298 0.87272727 0.78571429 0.83018868 0.86666667] mean value: 0.8358903188603343 key: train_fscore value: [0.87861272 0.87378641 0.87109375 0.86614173 0.88499025 0.86381323 0.83227176 0.87329435 0.90114068 0.85490196] mean value: 0.8700046844178336 key: test_precision value: [0.86206897 0.91304348 0.86666667 0.81481481 0.73333333 0.86206897 0.88888889 0.78571429 0.88 0.8125 ] mean value: 0.8419099398713341 key: train_precision value: [0.86037736 0.86206897 0.86100386 0.8627451 0.87644788 0.85384615 0.90322581 0.86486486 0.87132353 0.8515625 ] mean value: 0.8667466014073157 key: test_recall value: [0.89285714 0.75 0.89655172 0.75862069 0.78571429 0.89285714 0.85714286 0.78571429 0.78571429 0.92857143] mean value: 0.8333743842364532 key: train_recall value: [0.8976378 0.88582677 0.88142292 0.86956522 0.89370079 0.87401575 0.77165354 0.88188976 0.93307087 0.85826772] mean value: 0.8747051134418474 key: test_roc_auc value: [0.87746305 0.84051724 0.87684729 0.79002463 0.75 0.875 0.875 0.78571429 0.83928571 0.85714286] mean value: 0.8366995073891625 key: train_roc_auc value: [0.87569637 0.87176714 0.86984532 0.86588497 0.88385827 0.86220472 0.84448819 0.87204724 0.8976378 0.85433071] mean value: 0.8697760729513554 key: test_jcc value: [0.78125 0.7 0.78787879 0.64705882 0.61111111 0.78125 0.77419355 0.64705882 0.70967742 0.76470588] mean value: 0.7204184396143599 key: train_jcc value: [0.78350515 0.77586207 0.7716263 0.76388889 0.79370629 0.76027397 0.71272727 0.77508651 0.8200692 0.74657534] mean value: 0.7703321000916057 MCC on Blind test: 0.43 Accuracy on Blind test: 0.78 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01048732 0.01034641 0.01061344 0.01354074 0.01033831 0.01015806 0.01025128 0.01018333 0.01019001 0.01029396] mean value: 0.010640287399291992 key: score_time value: [0.00899577 0.00907207 0.00908756 0.00907373 0.00891471 0.00884938 0.00887942 0.00890255 0.00902867 0.00936127] mean value: 0.00901651382446289 key: test_mcc value: [0.72242731 0.61405719 0.58076493 0.47413793 0.39310793 0.60753044 0.64951905 0.75047877 0.58501794 0.57142857] mean value: 0.5948470056906343 key: train_mcc value: [0.61736329 0.62938349 0.62938349 0.63709364 0.64961133 0.59933628 0.63787438 0.59872224 0.62622211 0.63009708] mean value: 0.6255087330440435 key: test_accuracy value: [0.84210526 0.80701754 0.78947368 0.73684211 0.69642857 0.80357143 0.82142857 0.875 0.78571429 0.78571429] mean value: 0.794329573934837 key: train_accuracy value: [0.8086785 0.81459566 0.81459566 0.81854043 0.82480315 0.7992126 0.81889764 0.7992126 0.81299213 0.81496063] mean value: 0.8126488996567737 key: test_fscore value: [0.86153846 0.8 0.78571429 0.73684211 0.70175439 0.80701754 0.83333333 0.87272727 0.76 0.78571429] mean value: 0.7944641674115358 key: train_fscore value: [0.8086785 0.812749 0.81640625 0.81746032 0.82445759 0.79352227 0.8203125 0.796 0.81553398 0.812749 ] mean value: 0.8117869417892003 key: test_precision value: [0.75675676 0.81481481 0.81481481 0.75 0.68965517 0.79310345 0.78125 0.88888889 0.86363636 0.78571429] mean value: 0.7938634545315579 key: train_precision value: [0.81027668 0.82258065 0.80694981 0.82071713 0.82608696 0.81666667 0.81395349 0.80894309 0.8045977 0.82258065] mean value: 0.8153352810729207 key: test_recall value: [1. 0.78571429 0.75862069 0.72413793 0.71428571 0.82142857 0.89285714 0.85714286 0.67857143 0.78571429] mean value: 0.8018472906403941 key: train_recall value: [0.80708661 0.80314961 0.82608696 0.81422925 0.82283465 0.77165354 0.82677165 0.78346457 0.82677165 0.80314961] mean value: 0.8085198095297377 key: test_roc_auc value: [0.84482759 0.80665025 0.79002463 0.73706897 0.69642857 0.80357143 0.82142857 0.875 0.78571429 0.78571429] mean value: 0.7946428571428572 key: train_roc_auc value: [0.80868165 0.81461828 0.81461828 0.81853195 0.82480315 0.7992126 0.81889764 0.7992126 0.81299213 0.81496063] mean value: 0.8126528897326569 key: test_jcc value: [0.75675676 0.66666667 0.64705882 0.58333333 0.54054054 0.67647059 0.71428571 0.77419355 0.61290323 0.64705882] mean value: 0.6619268021070678 key: train_jcc value: [0.67880795 0.68456376 0.68976898 0.69127517 0.70134228 0.65771812 0.69536424 0.66112957 0.68852459 0.68456376] mean value: 0.6833058407846723 MCC on Blind test: 0.2 Accuracy on Blind test: 0.69 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01001644 0.01116848 0.01114178 0.01109457 0.01129532 0.01162243 0.0112431 0.01212406 0.01101089 0.01048851] mean value: 0.01112055778503418 key: score_time value: [0.01606083 0.01568818 0.01301765 0.01356721 0.01365757 0.01359797 0.01383781 0.01748276 0.01388144 0.01317573] mean value: 0.01439671516418457 key: test_mcc value: [0.7366424 0.6166424 0.68434084 0.6317806 0.5118907 0.58501794 0.53605627 0.68250015 0.58501794 0.46697379] mean value: 0.6036863009651918 key: train_mcc value: [0.76398832 0.74554603 0.75880927 0.76806178 0.79775247 0.78489793 0.76354997 0.76417218 0.73925749 0.79155948] mean value: 0.767759492902757 key: test_accuracy value: [0.85964912 0.80701754 0.84210526 0.80701754 0.75 0.78571429 0.76785714 0.83928571 0.78571429 0.73214286] mean value: 0.7976503759398497 key: train_accuracy value: [0.87771203 0.86982249 0.87573964 0.8816568 0.8976378 0.88779528 0.87992126 0.87992126 0.86811024 0.89370079] mean value: 0.8812017580642656 key: test_fscore value: [0.87096774 0.79245283 0.84745763 0.83076923 0.77419355 0.80645161 0.77192982 0.84745763 0.76 0.74576271] mean value: 0.8047442754846815 key: train_fscore value: [0.88644689 0.87777778 0.88354898 0.88764045 0.90151515 0.89579525 0.88555347 0.88598131 0.87382298 0.8988764 ] mean value: 0.8876958654685702 key: test_precision value: [0.79411765 0.84 0.83333333 0.75 0.70588235 0.73529412 0.75862069 0.80645161 0.86363636 0.70967742] mean value: 0.7797013536529993 key: train_precision value: [0.82876712 0.82867133 0.82986111 0.84341637 0.86861314 0.83617747 0.84587814 0.84341637 0.83754513 0.85714286] mean value: 0.841948903606986 key: test_recall value: [0.96428571 0.75 0.86206897 0.93103448 0.85714286 0.89285714 0.78571429 0.89285714 0.67857143 0.78571429] mean value: 0.8400246305418719 key: train_recall value: [0.95275591 0.93307087 0.94466403 0.93675889 0.93700787 0.96456693 0.92913386 0.93307087 0.91338583 0.94488189] mean value: 0.9389296940649218 key: test_roc_auc value: [0.8614532 0.80603448 0.84174877 0.80480296 0.75 0.78571429 0.76785714 0.83928571 0.78571429 0.73214286] mean value: 0.797475369458128 key: train_roc_auc value: [0.87756372 0.86969749 0.87587532 0.88176527 0.8976378 0.88779528 0.87992126 0.87992126 0.86811024 0.89370079] mean value: 0.8811988422395818 key: test_jcc value: [0.77142857 0.65625 0.73529412 0.71052632 0.63157895 0.67567568 0.62857143 0.73529412 0.61290323 0.59459459] mean value: 0.6752116994528734 key: train_jcc value: [0.79605263 0.78217822 0.79139073 0.7979798 0.82068966 0.81125828 0.79461279 0.79530201 0.77591973 0.81632653] mean value: 0.7981710380264788 MCC on Blind test: 0.24 Accuracy on Blind test: 0.7 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.03288817 0.02851772 0.02310157 0.02252603 0.02221966 0.02230144 0.02194214 0.02222419 0.0223546 0.02229118] mean value: 0.024036669731140138 key: score_time value: [0.01327085 0.01344895 0.0130353 0.01236677 0.01236677 0.01230669 0.01217031 0.01236486 0.01230979 0.01227903] mean value: 0.012591934204101563 key: test_mcc value: [0.80817326 0.8951918 0.82880708 0.79682005 0.61065803 0.72168784 0.79385662 0.82195294 0.64450339 0.77459667] mean value: 0.7696247676639518 key: train_mcc value: [0.82431719 0.86987986 0.8390677 0.85106594 0.84756752 0.84464326 0.86681377 0.83968318 0.87412415 0.84725158] mean value: 0.8504414138602343 key: test_accuracy value: [0.89473684 0.94736842 0.9122807 0.89473684 0.80357143 0.85714286 0.89285714 0.91071429 0.82142857 0.875 ] mean value: 0.8809837092731829 key: train_accuracy value: [0.9112426 0.93491124 0.91913215 0.92504931 0.92322835 0.92125984 0.93307087 0.91929134 0.93700787 0.92322835] mean value: 0.924742191989315 key: test_fscore value: [0.90322581 0.94545455 0.91803279 0.90322581 0.81355932 0.86666667 0.9 0.9122807 0.81481481 0.88888889] mean value: 0.8866149339401671 key: train_fscore value: [0.91428571 0.93542074 0.92069632 0.92664093 0.92514395 0.92395437 0.93436293 0.92130518 0.9375 0.92485549] mean value: 0.9264165644110587 key: test_precision value: [0.82352941 0.96296296 0.875 0.84848485 0.77419355 0.8125 0.84375 0.89655172 0.84615385 0.8 ] mean value: 0.8483126341891392 key: train_precision value: [0.88560886 0.92996109 0.90151515 0.90566038 0.90262172 0.89338235 0.91666667 0.8988764 0.93023256 0.90566038] mean value: 0.9070185556903059 key: test_recall value: [1. 0.92857143 0.96551724 0.96551724 0.85714286 0.92857143 0.96428571 0.92857143 0.78571429 1. ] mean value: 0.9323891625615763 key: train_recall value: [0.94488189 0.94094488 0.94071146 0.9486166 0.9488189 0.95669291 0.95275591 0.94488189 0.94488189 0.94488189] mean value: 0.9468068220721422 key: test_roc_auc value: [0.89655172 0.94704433 0.91133005 0.89347291 0.80357143 0.85714286 0.89285714 0.91071429 0.82142857 0.875 ] mean value: 0.8809113300492611 key: train_roc_auc value: [0.91117612 0.93489932 0.91917463 0.9250957 0.92322835 0.92125984 0.93307087 0.91929134 0.93700787 0.92322835] mean value: 0.924743238616912 key: test_jcc value: [0.82352941 0.89655172 0.84848485 0.82352941 0.68571429 0.76470588 0.81818182 0.83870968 0.6875 0.8 ] mean value: 0.7986907059820592 key: train_jcc value: [0.84210526 0.87867647 0.85304659 0.86330935 0.86071429 0.85865724 0.87681159 0.85409253 0.88235294 0.86021505] mean value: 0.8629981326609936 MCC on Blind test: 0.66 Accuracy on Blind test: 0.88 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [2.15022874 2.26373672 2.10535789 1.289114 2.09971786 2.04547882 2.0241437 2.11169195 2.62805915 2.49180293] mean value: 2.120933175086975 key: score_time value: [0.01304388 0.01381183 0.02200556 0.01263905 0.01398277 0.03204012 0.01388478 0.01397943 0.02038717 0.0143764 ] mean value: 0.01701509952545166 key: test_mcc value: [0.96551724 0.89988258 0.96547546 0.83703659 0.79385662 0.89342711 0.96490128 0.96490128 0.93094934 0.8660254 ] mean value: 0.908197290053634 key: train_mcc value: [0.99606293 0.99211042 1. 0.99211042 1. 0.99607071 0.99607071 1. 0.99607071 0.99212598] mean value: 0.9960621896302765 key: test_accuracy value: [0.98245614 0.94736842 0.98245614 0.9122807 0.89285714 0.94642857 0.98214286 0.98214286 0.96428571 0.92857143] mean value: 0.9520989974937343 key: train_accuracy value: [0.99802761 0.99605523 1. 0.99605523 1. 0.9980315 0.9980315 1. 0.9980315 0.99606299] mean value: 0.9980295547376105 key: test_fscore value: [0.98245614 0.94915254 0.98305085 0.92063492 0.9 0.94545455 0.98245614 0.98245614 0.96551724 0.93333333] mean value: 0.9544511851685249 key: train_fscore value: [0.99803536 0.99606299 1. 0.99604743 1. 0.99803536 0.99803536 1. 0.99803536 0.99606299] mean value: 0.9980314868913049 key: test_precision value: [0.96551724 0.90322581 0.96666667 0.85294118 0.84375 0.96296296 0.96551724 0.96551724 0.93333333 0.875 ] mean value: 0.9234431670023096 key: train_precision value: [0.99607843 0.99606299 1. 0.99604743 1. 0.99607843 0.99607843 1. 0.99607843 0.99606299] mean value: 0.9972487140572204 key: test_recall value: [1. 1. 1. 1. 0.96428571 0.92857143 1. 1. 1. 1. ] mean value: 0.9892857142857143 key: train_recall value: [1. 0.99606299 1. 0.99604743 1. 1. 1. 1. 1. 0.99606299] mean value: 0.9988173415082008 key: test_roc_auc value: [0.98275862 0.94827586 0.98214286 0.91071429 0.89285714 0.94642857 0.98214286 0.98214286 0.96428571 0.92857143] mean value: 0.9520320197044335 key: train_roc_auc value: [0.99802372 0.99605521 1. 0.99605521 1. 0.9980315 0.9980315 1. 0.9980315 0.99606299] mean value: 0.9980291618686005 key: test_jcc value: [0.96551724 0.90322581 0.96666667 0.85294118 0.81818182 0.89655172 0.96551724 0.96551724 0.93333333 0.875 ] mean value: 0.9142452249379882 key: train_jcc value: [0.99607843 0.99215686 1. 0.99212598 1. 0.99607843 0.99607843 1. 0.99607843 0.99215686] mean value: 0.9960753435232361 MCC on Blind test: 0.8 Accuracy on Blind test: 0.93 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02960896 0.02699208 0.02417684 0.03163671 0.02562284 0.02379894 0.03221059 0.02416635 0.02949929 0.02938581] mean value: 0.02770984172821045 key: score_time value: [0.01288438 0.01086068 0.00921392 0.0142808 0.0104208 0.01039267 0.01026678 0.00972581 0.01522899 0.01317906] mean value: 0.01164538860321045 key: test_mcc value: [0.96551724 1. 0.96547546 0.96547546 0.89342711 0.96490128 0.96490128 0.93094934 0.89342711 0.96490128] mean value: 0.9508975559462645 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 1. 0.98245614 0.98245614 0.94642857 0.98214286 0.98214286 0.96428571 0.94642857 0.98214286] mean value: 0.975093984962406 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98245614 1. 0.98305085 0.98305085 0.94736842 0.98245614 0.98245614 0.96551724 0.94736842 0.98245614] mean value: 0.9756180339803336 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96551724 1. 0.96666667 0.96666667 0.93103448 0.96551724 0.96551724 0.93333333 0.93103448 0.96551724] mean value: 0.9590804597701149 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.96428571 1. 1. 1. 0.96428571 1. ] mean value: 0.9928571428571429 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98275862 1. 0.98214286 0.98214286 0.94642857 0.98214286 0.98214286 0.96428571 0.94642857 0.98214286] mean value: 0.9750615763546799 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96551724 1. 0.96666667 0.96666667 0.9 0.96551724 0.96551724 0.93333333 0.9 0.96551724] mean value: 0.9528735632183908 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.87 Accuracy on Blind test: 0.96 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.16124463 0.13249683 0.12512231 0.123142 0.12262487 0.12115049 0.1241188 0.12215066 0.12319398 0.12175512] mean value: 0.12769997119903564 key: score_time value: [0.02338743 0.01843238 0.02020693 0.01953816 0.01848555 0.01992226 0.01976848 0.01945233 0.0198977 0.0199244 ] mean value: 0.019901561737060546 key: test_mcc value: [1. 1. 0.96547546 0.96547546 0.89342711 0.93094934 1. 0.96490128 0.89342711 0.93094934] mean value: 0.9544605091626567 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 1. 0.98245614 0.98245614 0.94642857 0.96428571 1. 0.98214286 0.94642857 0.96428571] mean value: 0.9768483709273182 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 1. 0.98305085 0.98305085 0.94736842 0.96296296 1. 0.98245614 0.94736842 0.96551724] mean value: 0.9771774881713668 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.96666667 0.96666667 0.93103448 1. 1. 0.96551724 0.93103448 0.93333333] mean value: 0.9694252873563218 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.96428571 0.92857143 1. 1. 0.96428571 1. ] mean value: 0.9857142857142858 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 1. 0.98214286 0.98214286 0.94642857 0.96428571 1. 0.98214286 0.94642857 0.96428571] mean value: 0.9767857142857144 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 1. 0.96666667 0.96666667 0.9 0.92857143 1. 0.96551724 0.9 0.93333333] mean value: 0.9560755336617406 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.6 Accuracy on Blind test: 0.88 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01061201 0.01145554 0.01236582 0.01151681 0.01169515 0.01167917 0.01173377 0.01138639 0.01171923 0.0116694 ] mean value: 0.011583328247070312 key: score_time value: [0.00956845 0.00910449 0.01007557 0.00983143 0.00965309 0.00978756 0.00995731 0.00933337 0.01028538 0.00958109] mean value: 0.009717774391174317 key: test_mcc value: [0.80817326 0.86851042 0.89952865 0.86789789 0.64116714 0.85714286 0.89802651 0.8660254 0.89342711 0.83484711] mean value: 0.8434746352564939 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.89473684 0.92982456 0.94736842 0.92982456 0.80357143 0.92857143 0.94642857 0.92857143 0.94642857 0.91071429] mean value: 0.9166040100250626 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90322581 0.93333333 0.95081967 0.93548387 0.83076923 0.92857143 0.94915254 0.93333333 0.94736842 0.91803279] mean value: 0.9230090425868587 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.82352941 0.875 0.90625 0.87878788 0.72972973 0.92857143 0.90322581 0.875 0.93103448 0.84848485] mean value: 0.8699613586548824 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.96428571 0.92857143 1. 1. 0.96428571 1. ] mean value: 0.9857142857142858 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.89655172 0.93103448 0.94642857 0.92857143 0.80357143 0.92857143 0.94642857 0.92857143 0.94642857 0.91071429] mean value: 0.9166871921182267 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.82352941 0.875 0.90625 0.87878788 0.71052632 0.86666667 0.90322581 0.875 0.9 0.84848485] mean value: 0.8587470927945187 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.53 Accuracy on Blind test: 0.86 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.88762093 1.83851171 1.77748513 1.75392604 1.80487251 1.68222833 1.85387468 1.78196788 1.73018646 1.72484589] mean value: 1.7835519552230834 key: score_time value: [0.10137248 0.10127664 0.10406446 0.0981307 0.09335041 0.10171962 0.10252476 0.09720516 0.09547591 0.09470224] mean value: 0.09898223876953124 key: test_mcc value: [0.96551724 1. 0.96547546 0.96547546 0.89342711 0.93094934 1. 0.96490128 0.89342711 0.96490128] mean value: 0.954407427810863 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 1. 0.98245614 0.98245614 0.94642857 0.96428571 1. 0.98214286 0.94642857 0.98214286] mean value: 0.9768796992481202 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98245614 1. 0.98305085 0.98305085 0.94736842 0.96296296 1. 0.98245614 0.94736842 0.98245614] mean value: 0.9771169921036111 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96551724 1. 0.96666667 0.96666667 0.93103448 1. 1. 0.96551724 0.93103448 0.96551724] mean value: 0.9691954022988506 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.96428571 0.92857143 1. 1. 0.96428571 1. ] mean value: 0.9857142857142858 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98275862 1. 0.98214286 0.98214286 0.94642857 0.96428571 1. 0.98214286 0.94642857 0.98214286] mean value: 0.9768472906403942 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96551724 1. 0.96666667 0.96666667 0.9 0.92857143 1. 0.96551724 0.9 0.96551724] mean value: 0.9558456486042693 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.79 Accuracy on Blind test: 0.93 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.95136046 0.98831916 0.95716357 0.93647122 0.99228835 1.02972484 0.97567821 0.97857833 0.98092842 0.9740994 ] mean value: 0.9764611959457398 key: score_time value: [0.24091625 0.2508409 0.16018391 0.22381854 0.2695148 0.21784067 0.22880816 0.24814248 0.26897693 0.26349545] mean value: 0.23725380897521972 key: test_mcc value: [0.96551724 0.93202124 0.96547546 1. 0.93094934 0.93094934 1. 0.96490128 0.93094934 0.93094934] mean value: 0.9551712565684981 key: train_mcc value: [0.98046604 0.9685613 0.98046755 0.97660594 0.98437404 0.97665048 0.98050495 0.98437404 0.98050495 0.98050495] mean value: 0.979301423744519 key: test_accuracy value: [0.98245614 0.96491228 0.98245614 1. 0.96428571 0.96428571 1. 0.98214286 0.96428571 0.96428571] mean value: 0.9769110275689223 key: train_accuracy value: [0.99013807 0.98422091 0.99013807 0.98816568 0.99212598 0.98818898 0.99015748 0.99212598 0.99015748 0.99015748] mean value: 0.9895576107720263 key: test_fscore value: [0.98245614 0.96296296 0.98305085 1. 0.96551724 0.96296296 1. 0.98245614 0.96551724 0.96551724] mean value: 0.9770440778223238 key: train_fscore value: [0.99025341 0.984375 0.99021526 0.98828125 0.9921875 0.98832685 0.99025341 0.9921875 0.99025341 0.99025341] mean value: 0.9896587007661066 key: test_precision value: [0.96551724 1. 0.96666667 1. 0.93333333 1. 1. 0.96551724 0.93333333 0.93333333] mean value: 0.9697701149425287 key: train_precision value: [0.98069498 0.97674419 0.98062016 0.97683398 0.98449612 0.97692308 0.98069498 0.98449612 0.98069498 0.98069498] mean value: 0.9802893565684263 key: test_recall value: [1. 0.92857143 1. 1. 1. 0.92857143 1. 1. 1. 1. ] mean value: 0.9857142857142858 key: train_recall value: [1. 0.99212598 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9992125984251968 key: test_roc_auc value: [0.98275862 0.96428571 0.98214286 1. 0.96428571 0.96428571 1. 0.98214286 0.96428571 0.96428571] mean value: 0.9768472906403941 key: train_roc_auc value: [0.99011858 0.98420528 0.99015748 0.98818898 0.99212598 0.98818898 0.99015748 0.99212598 0.99015748 0.99015748] mean value: 0.9895583704210887 key: test_jcc value: [0.96551724 0.92857143 0.96666667 1. 0.93333333 0.92857143 1. 0.96551724 0.93333333 0.93333333] mean value: 0.9554844006568145 key: train_jcc value: [0.98069498 0.96923077 0.98062016 0.97683398 0.98449612 0.97692308 0.98069498 0.98449612 0.98069498 0.98069498] mean value: 0.9795380148868521 MCC on Blind test: 0.83 Accuracy on Blind test: 0.94 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01103711 0.01071525 0.01178098 0.01100349 0.01181769 0.0120666 0.01212502 0.01112056 0.01161766 0.01164937] mean value: 0.011493372917175292 key: score_time value: [0.01010919 0.00939178 0.00976348 0.01021886 0.01012588 0.0094111 0.00998759 0.00937343 0.00957465 0.00951076] mean value: 0.009746670722961426 key: test_mcc value: [0.72242731 0.61405719 0.58076493 0.47413793 0.39310793 0.60753044 0.64951905 0.75047877 0.58501794 0.57142857] mean value: 0.5948470056906343 key: train_mcc value: [0.61736329 0.62938349 0.62938349 0.63709364 0.64961133 0.59933628 0.63787438 0.59872224 0.62622211 0.63009708] mean value: 0.6255087330440435 key: test_accuracy value: [0.84210526 0.80701754 0.78947368 0.73684211 0.69642857 0.80357143 0.82142857 0.875 0.78571429 0.78571429] mean value: 0.794329573934837 key: train_accuracy value: [0.8086785 0.81459566 0.81459566 0.81854043 0.82480315 0.7992126 0.81889764 0.7992126 0.81299213 0.81496063] mean value: 0.8126488996567737 key: test_fscore value: [0.86153846 0.8 0.78571429 0.73684211 0.70175439 0.80701754 0.83333333 0.87272727 0.76 0.78571429] mean value: 0.7944641674115358 key: train_fscore value: [0.8086785 0.812749 0.81640625 0.81746032 0.82445759 0.79352227 0.8203125 0.796 0.81553398 0.812749 ] mean value: 0.8117869417892003 key: test_precision value: [0.75675676 0.81481481 0.81481481 0.75 0.68965517 0.79310345 0.78125 0.88888889 0.86363636 0.78571429] mean value: 0.7938634545315579 key: train_precision value: [0.81027668 0.82258065 0.80694981 0.82071713 0.82608696 0.81666667 0.81395349 0.80894309 0.8045977 0.82258065] mean value: 0.8153352810729207 key: test_recall value: [1. 0.78571429 0.75862069 0.72413793 0.71428571 0.82142857 0.89285714 0.85714286 0.67857143 0.78571429] mean value: 0.8018472906403941 key: train_recall value: [0.80708661 0.80314961 0.82608696 0.81422925 0.82283465 0.77165354 0.82677165 0.78346457 0.82677165 0.80314961] mean value: 0.8085198095297377 key: test_roc_auc value: [0.84482759 0.80665025 0.79002463 0.73706897 0.69642857 0.80357143 0.82142857 0.875 0.78571429 0.78571429] mean value: 0.7946428571428572 key: train_roc_auc value: [0.80868165 0.81461828 0.81461828 0.81853195 0.82480315 0.7992126 0.81889764 0.7992126 0.81299213 0.81496063] mean value: 0.8126528897326569 key: test_jcc value: [0.75675676 0.66666667 0.64705882 0.58333333 0.54054054 0.67647059 0.71428571 0.77419355 0.61290323 0.64705882] mean value: 0.6619268021070678 key: train_jcc value: [0.67880795 0.68456376 0.68976898 0.69127517 0.70134228 0.65771812 0.69536424 0.66112957 0.68852459 0.68456376] mean value: 0.6833058407846723 MCC on Blind test: 0.2 Accuracy on Blind test: 0.69 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.0972321 0.07731676 0.07649851 0.0709219 0.07835388 0.07469583 0.0731678 0.22985172 0.07314754 0.074085 ] mean value: 0.09252710342407226 key: score_time value: [0.01207376 0.01114559 0.01213479 0.01066709 0.01160574 0.01146913 0.01133513 0.01144457 0.01144123 0.01136351] mean value: 0.011468052864074707 key: test_mcc value: [0.96551724 1. 0.96547546 0.96547546 0.92857143 0.96490128 1. 0.96490128 0.93094934 0.96490128] mean value: 0.9650692763304416 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 1. 0.98245614 0.98245614 0.96428571 0.98214286 1. 0.98214286 0.96428571 0.98214286] mean value: 0.9822368421052632 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98245614 1. 0.98305085 0.98305085 0.96428571 0.98245614 1. 0.98245614 0.96551724 0.98245614] mean value: 0.9825729211983787 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96551724 1. 0.96666667 0.96666667 0.96428571 0.96551724 1. 0.96551724 0.93333333 0.96551724] mean value: 0.9693021346469622 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.96428571 1. 1. 1. 1. 1. ] mean value: 0.9964285714285714 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98275862 1. 0.98214286 0.98214286 0.96428571 0.98214286 1. 0.98214286 0.96428571 0.98214286] mean value: 0.9822044334975369 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96551724 1. 0.96666667 0.96666667 0.93103448 0.96551724 1. 0.96551724 0.93333333 0.96551724] mean value: 0.9659770114942529 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.88 Accuracy on Blind test: 0.96 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.045156 0.05143094 0.04170799 0.07557249 0.04485083 0.05265903 0.04891658 0.07405901 0.04442978 0.07700086] mean value: 0.05557835102081299 key: score_time value: [0.01947165 0.01236486 0.01648664 0.01348376 0.02269197 0.01272631 0.02324986 0.01244068 0.01243663 0.01249099] mean value: 0.015784335136413575 key: test_mcc value: [0.96551724 0.85960591 0.79110556 0.89952865 0.85933785 1. 0.82618439 0.89802651 0.89802651 0.8660254 ] mean value: 0.8863358024357655 key: train_mcc value: [0.96055211 0.96847134 0.95661443 0.95661511 0.9645744 0.96062992 0.9645744 0.96850394 0.96062992 0.96062992] mean value: 0.9621795507209824 key: test_accuracy value: [0.98245614 0.92982456 0.89473684 0.94736842 0.92857143 1. 0.91071429 0.94642857 0.94642857 0.92857143] mean value: 0.9415100250626567 key: train_accuracy value: [0.98027613 0.98422091 0.97830375 0.97830375 0.98228346 0.98031496 0.98228346 0.98425197 0.98031496 0.98031496] mean value: 0.9810868316016711 key: test_fscore value: [0.98245614 0.92857143 0.9 0.95081967 0.92592593 1. 0.91525424 0.94915254 0.94915254 0.93333333] mean value: 0.9434665822346611 key: train_fscore value: [0.98031496 0.98431373 0.97821782 0.97830375 0.98231827 0.98031496 0.98231827 0.98425197 0.98031496 0.98031496] mean value: 0.98109836480702 key: test_precision value: [0.96551724 0.92857143 0.87096774 0.90625 0.96153846 1. 0.87096774 0.90322581 0.90322581 0.875 ] mean value: 0.9185264228263395 key: train_precision value: [0.98031496 0.98046875 0.98015873 0.97637795 0.98039216 0.98031496 0.98039216 0.98425197 0.98031496 0.98031496] mean value: 0.9803301557663748 key: test_recall value: [1. 0.92857143 0.93103448 1. 0.89285714 1. 0.96428571 1. 1. 1. ] mean value: 0.9716748768472907 key: train_recall value: [0.98031496 0.98818898 0.97628458 0.98023715 0.98425197 0.98031496 0.98425197 0.98425197 0.98031496 0.98031496] mean value: 0.9818726463539884 key: test_roc_auc value: [0.98275862 0.92980296 0.89408867 0.94642857 0.92857143 1. 0.91071429 0.94642857 0.94642857 0.92857143] mean value: 0.9413793103448276 key: train_roc_auc value: [0.98027606 0.98421307 0.97829977 0.97830755 0.98228346 0.98031496 0.98228346 0.98425197 0.98031496 0.98031496] mean value: 0.9810860228439825 key: test_jcc value: [0.96551724 0.86666667 0.81818182 0.90625 0.86206897 1. 0.84375 0.90322581 0.90322581 0.875 ] mean value: 0.8943886304648262 key: train_jcc value: [0.96138996 0.96911197 0.95736434 0.95752896 0.96525097 0.96138996 0.96525097 0.96899225 0.96138996 0.96138996] mean value: 0.962905929184999 MCC on Blind test: 0.61 Accuracy on Blind test: 0.86 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01605272 0.01064062 0.01053286 0.01062799 0.01046133 0.01057339 0.01031685 0.01173782 0.01179981 0.01043272] mean value: 0.011317610740661621 key: score_time value: [0.01308727 0.0094192 0.00935388 0.00926232 0.0092082 0.00946975 0.00921726 0.00949192 0.00977159 0.00906062] mean value: 0.009734201431274413 key: test_mcc value: [0.70694956 0.79682005 0.61405719 0.54433498 0.35805744 0.57735027 0.61065803 0.4645821 0.61706091 0.61065803] mean value: 0.59005285401838 key: train_mcc value: [0.65069271 0.60967718 0.61360065 0.56269586 0.67365136 0.57949966 0.61061966 0.63009708 0.67887215 0.56699945] mean value: 0.6176405758990616 key: test_accuracy value: [0.84210526 0.89473684 0.80701754 0.77192982 0.67857143 0.78571429 0.80357143 0.73214286 0.80357143 0.80357143] mean value: 0.7922932330827067 key: train_accuracy value: [0.82445759 0.80473373 0.80670611 0.78106509 0.83661417 0.78937008 0.80511811 0.81496063 0.83858268 0.78346457] mean value: 0.8085072760875305 key: test_fscore value: [0.85714286 0.88461538 0.81355932 0.77192982 0.68965517 0.8 0.81355932 0.73684211 0.78431373 0.81355932] mean value: 0.7965177035588487 key: train_fscore value: [0.83111954 0.80776699 0.80859375 0.78529981 0.83945841 0.79462572 0.80851064 0.81712062 0.84410646 0.78515625] mean value: 0.812175819990016 key: test_precision value: [0.77142857 0.95833333 0.8 0.78571429 0.66666667 0.75 0.77419355 0.72413793 0.86956522 0.77419355] mean value: 0.7874233102342838 key: train_precision value: [0.8021978 0.79693487 0.7992278 0.76893939 0.82509506 0.7752809 0.79467681 0.80769231 0.81617647 0.77906977] mean value: 0.7965291168982057 key: test_recall value: [0.96428571 0.82142857 0.82758621 0.75862069 0.71428571 0.85714286 0.85714286 0.75 0.71428571 0.85714286] mean value: 0.812192118226601 key: train_recall value: [0.86220472 0.81889764 0.81818182 0.80237154 0.85433071 0.81496063 0.82283465 0.82677165 0.87401575 0.79133858] mean value: 0.8285907690392456 key: test_roc_auc value: [0.84421182 0.89347291 0.80665025 0.77216749 0.67857143 0.78571429 0.80357143 0.73214286 0.80357143 0.80357143] mean value: 0.7923645320197044 key: train_roc_auc value: [0.82438299 0.80470574 0.8067287 0.78110703 0.83661417 0.78937008 0.80511811 0.81496063 0.83858268 0.78346457] mean value: 0.8085034701689957 key: test_jcc value: [0.75 0.79310345 0.68571429 0.62857143 0.52631579 0.66666667 0.68571429 0.58333333 0.64516129 0.68571429] mean value: 0.6650294813786413 key: train_jcc value: [0.71103896 0.67752443 0.67868852 0.64649682 0.72333333 0.65923567 0.67857143 0.69078947 0.73026316 0.64630225] mean value: 0.6842244043960553 MCC on Blind test: 0.59 Accuracy on Blind test: 0.83 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01944375 0.02495122 0.02653742 0.0216639 0.03189659 0.02602649 0.02330875 0.02988434 0.02322054 0.03054476] mean value: 0.02574777603149414 key: score_time value: [0.01058817 0.01182628 0.01213336 0.01234889 0.01249361 0.01251531 0.012465 0.01516986 0.0150609 0.02132058] mean value: 0.013592195510864259 key: test_mcc value: [0.89988258 0.8951918 0.7366424 0.89952865 0.78772636 0.89342711 0.52223297 0.92857143 0.70082556 0.8660254 ] mean value: 0.8130054254678469 key: train_mcc value: [0.92712676 0.95292731 0.93792915 0.90342654 0.96074906 0.95670033 0.6780635 0.97250878 0.89014893 0.96853396] mean value: 0.9148114321896691 key: test_accuracy value: [0.94736842 0.94736842 0.85964912 0.94736842 0.89285714 0.94642857 0.71428571 0.96428571 0.83928571 0.92857143] mean value: 0.8987468671679197 key: train_accuracy value: [0.96252465 0.97633136 0.96844181 0.95069034 0.98031496 0.97834646 0.81496063 0.98622047 0.94291339 0.98425197] mean value: 0.9544996039696222 key: test_fscore value: [0.94915254 0.94545455 0.84615385 0.95081967 0.89655172 0.94545455 0.77777778 0.96428571 0.81632653 0.93333333] mean value: 0.9025310231713968 key: train_fscore value: [0.96380952 0.9766537 0.96761134 0.95219885 0.98046875 0.978389 0.84385382 0.98613861 0.93995859 0.98418972] mean value: 0.9573271907059853 key: test_precision value: [0.90322581 0.96296296 0.95652174 0.90625 0.86666667 0.96296296 0.63636364 0.96428571 0.95238095 0.875 ] mean value: 0.8986620441204943 key: train_precision value: [0.93357934 0.96538462 0.99170124 0.92222222 0.97286822 0.97647059 0.72988506 0.99203187 0.99126638 0.98809524] mean value: 0.9463504767125346 key: test_recall value: [1. 0.92857143 0.75862069 1. 0.92857143 0.92857143 1. 0.96428571 0.71428571 1. ] mean value: 0.9222906403940887 key: train_recall value: [0.99606299 0.98818898 0.94466403 0.98418972 0.98818898 0.98031496 1. 0.98031496 0.89370079 0.98031496] mean value: 0.973594036911394 key: test_roc_auc value: [0.94827586 0.94704433 0.8614532 0.94642857 0.89285714 0.94642857 0.71428571 0.96428571 0.83928571 0.92857143] mean value: 0.8988916256157635 key: train_roc_auc value: [0.96245837 0.97630793 0.96839501 0.95075628 0.98031496 0.97834646 0.81496063 0.98622047 0.94291339 0.98425197] mean value: 0.9544925461392425 key: test_jcc value: [0.90322581 0.89655172 0.73333333 0.90625 0.8125 0.89655172 0.63636364 0.93103448 0.68965517 0.875 ] mean value: 0.8280465879596859 key: train_jcc value: [0.93014706 0.95437262 0.9372549 0.90875912 0.96168582 0.95769231 0.72988506 0.97265625 0.88671875 0.9688716 ] mean value: 0.920804349269515 MCC on Blind test: 0.76 Accuracy on Blind test: 0.91 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02200437 0.02010751 0.01893258 0.01999164 0.02031374 0.01846647 0.02055311 0.02092481 0.01913881 0.02208757] mean value: 0.020252060890197755 key: score_time value: [0.01222563 0.01219106 0.01215363 0.0128088 0.01377892 0.01212907 0.01220179 0.0122571 0.01221776 0.01227355] mean value: 0.01242372989654541 key: test_mcc value: [0.96551724 0.86789789 0.83703659 0.74822828 0.72168784 0.71611487 0.72168784 0.96490128 0.73127242 0.89802651] mean value: 0.8172370767372027 key: train_mcc value: [0.77941536 0.90393669 0.89862256 0.80222203 0.96463421 0.86255889 0.91852667 0.9645744 0.84762399 0.96137528] mean value: 0.8903490079543654 key: test_accuracy value: [0.98245614 0.92982456 0.9122807 0.85964912 0.85714286 0.85714286 0.85714286 0.98214286 0.85714286 0.94642857] mean value: 0.9041353383458646 key: train_accuracy value: [0.87968442 0.95069034 0.94674556 0.89151874 0.98228346 0.92716535 0.95866142 0.98228346 0.91929134 0.98031496] mean value: 0.9418639053254438 key: test_fscore value: [0.98245614 0.92307692 0.92063492 0.87878788 0.86666667 0.86206897 0.86666667 0.98245614 0.84 0.94915254] mean value: 0.9071966844424932 key: train_fscore value: [0.86474501 0.94887526 0.94934334 0.90196078 0.98238748 0.93186004 0.9596929 0.98231827 0.91295117 0.98069498] mean value: 0.941482922079735 key: test_precision value: [0.96551724 1. 0.85294118 0.78378378 0.8125 0.83333333 0.8125 0.96551724 0.95454545 0.90322581] mean value: 0.8883864037343394 key: train_precision value: [0.98984772 0.98723404 0.90357143 0.82142857 0.9766537 0.87543253 0.93632959 0.98039216 0.99078341 0.96212121] mean value: 0.9423794347876031 key: test_recall value: [1. 0.85714286 1. 1. 0.92857143 0.89285714 0.92857143 1. 0.75 1. ] mean value: 0.9357142857142857 key: train_recall value: [0.76771654 0.91338583 1. 1. 0.98818898 0.99606299 0.98425197 0.98425197 0.84645669 1. ] mean value: 0.9480314960629921 key: test_roc_auc value: [0.98275862 0.92857143 0.91071429 0.85714286 0.85714286 0.85714286 0.85714286 0.98214286 0.85714286 0.94642857] mean value: 0.9036330049261084 key: train_roc_auc value: [0.8799057 0.95076406 0.94685039 0.89173228 0.98228346 0.92716535 0.95866142 0.98228346 0.91929134 0.98031496] mean value: 0.9419252435342815 key: test_jcc value: [0.96551724 0.85714286 0.85294118 0.78378378 0.76470588 0.75757576 0.76470588 0.96551724 0.72413793 0.90322581] mean value: 0.8339253559923585 key: train_jcc value: [0.76171875 0.90272374 0.90357143 0.82142857 0.96538462 0.87241379 0.92250923 0.96525097 0.83984375 0.96212121] mean value: 0.8916966046361052 MCC on Blind test: 0.71 Accuracy on Blind test: 0.89 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.18998051 0.18064475 0.18306255 0.18004155 0.18018317 0.17893577 0.17555499 0.179106 0.17361403 0.17628264] mean value: 0.1797405958175659 key: score_time value: [0.01593828 0.01705146 0.01724339 0.01699996 0.01718593 0.01622796 0.01678467 0.01644039 0.01647544 0.01549101] mean value: 0.016583847999572753 key: test_mcc value: [0.96551724 1. 0.96547546 0.96547546 0.96490128 0.96490128 1. 0.96490128 0.93094934 0.96490128] mean value: 0.9687022616087002 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 1. 0.98245614 0.98245614 0.98214286 0.98214286 1. 0.98214286 0.96428571 0.98214286] mean value: 0.9840225563909775 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98245614 1. 0.98305085 0.98305085 0.98245614 0.98245614 1. 0.98245614 0.96551724 0.98245614] mean value: 0.984389963804895 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96551724 1. 0.96666667 0.96666667 0.96551724 0.96551724 1. 0.96551724 0.93333333 0.96551724] mean value: 0.9694252873563218 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98275862 1. 0.98214286 0.98214286 0.98214286 0.98214286 1. 0.98214286 0.96428571 0.98214286] mean value: 0.9839901477832513 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96551724 1. 0.96666667 0.96666667 0.96551724 0.96551724 1. 0.96551724 0.93333333 0.96551724] mean value: 0.9694252873563218 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.88 Accuracy on Blind test: 0.96 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.06405163 0.07544374 0.07800102 0.0659802 0.08432174 0.09123611 0.07245827 0.06432152 0.07139039 0.0639205 ] mean value: 0.07311251163482665 key: score_time value: [0.02064395 0.02726412 0.02178264 0.02308631 0.03058195 0.0379889 0.01936054 0.02460885 0.0287199 0.02765727] mean value: 0.026169443130493165 key: test_mcc value: [0.96551724 0.93202124 0.96547546 1. 0.89342711 0.96490128 0.92857143 0.96490128 0.93094934 0.93094934] mean value: 0.9476713715472729 key: train_mcc value: [1. 0.99211042 1. 0.99214142 1. 1. 0.99212598 1. 0.99607071 0.99215674] mean value: 0.996460528395497 key: test_accuracy value: [0.98245614 0.96491228 0.98245614 1. 0.94642857 0.98214286 0.96428571 0.98214286 0.96428571 0.96428571] mean value: 0.9733395989974937 key: train_accuracy value: [1. 0.99605523 1. 0.99605523 1. 1. 0.99606299 1. 0.9980315 0.99606299] mean value: 0.9982267933963875 key: test_fscore value: [0.98245614 0.96296296 0.98305085 1. 0.94736842 0.98245614 0.96428571 0.98245614 0.96551724 0.96551724] mean value: 0.9736070849570189 key: train_fscore value: [1. 0.99606299 1. 0.99606299 1. 1. 0.99606299 1. 0.99803536 0.99607843] mean value: 0.9982302771208262 key: test_precision value: [0.96551724 1. 0.96666667 1. 0.93103448 0.96551724 0.96428571 0.96551724 0.93333333 0.93333333] mean value: 0.96252052545156 key: train_precision value: [1. 0.99606299 1. 0.99215686 1. 1. 0.99606299 1. 0.99607843 0.9921875 ] mean value: 0.9972548778369615 key: test_recall value: [1. 0.92857143 1. 1. 0.96428571 1. 0.96428571 1. 1. 1. ] mean value: 0.9857142857142858 key: train_recall value: [1. 0.99606299 1. 1. 1. 1. 0.99606299 1. 1. 1. ] mean value: 0.9992125984251968 key: test_roc_auc value: [0.98275862 0.96428571 0.98214286 1. 0.94642857 0.98214286 0.96428571 0.98214286 0.96428571 0.96428571] mean value: 0.9732758620689655 key: train_roc_auc value: [1. 0.99605521 1. 0.99606299 1. 1. 0.99606299 1. 0.9980315 0.99606299] mean value: 0.9982275683918956 key: test_jcc value: [0.96551724 0.92857143 0.96666667 1. 0.9 0.96551724 0.93103448 0.96551724 0.93333333 0.93333333] mean value: 0.9489490968801314 key: train_jcc value: [1. 0.99215686 1. 0.99215686 1. 1. 0.99215686 1. 0.99607843 0.9921875 ] mean value: 0.9964736519607843 MCC on Blind test: 0.85 Accuracy on Blind test: 0.94 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.17205048 0.19767261 0.18131852 0.16802168 0.2220397 0.16332126 0.26538157 0.2428298 0.2475462 0.21324587] mean value: 0.2073427677154541 key: score_time value: [0.03067493 0.02527881 0.0278089 0.01557803 0.02717137 0.0271771 0.02811146 0.03504801 0.04026318 0.02716613] mean value: 0.028427791595458985 key: test_mcc value: [0.77903565 0.8953202 0.96547546 0.80685836 0.76225171 0.85714286 0.8660254 0.93094934 0.89342711 0.77459667] mean value: 0.8531082753337808 key: train_mcc value: [0.98434291 0.98823457 0.98434388 0.98823511 0.99215674 0.99215674 0.98437404 0.98437404 0.99215674 0.98437404] mean value: 0.9874748809761109 key: test_accuracy value: [0.87719298 0.94736842 0.98245614 0.89473684 0.875 0.92857143 0.92857143 0.96428571 0.94642857 0.875 ] mean value: 0.9219611528822055 key: train_accuracy value: [0.99211045 0.99408284 0.99211045 0.99408284 0.99606299 0.99606299 0.99212598 0.99212598 0.99606299 0.99212598] mean value: 0.9936953516905062 key: test_fscore value: [0.88888889 0.94736842 0.98305085 0.90625 0.8852459 0.92857143 0.93333333 0.96551724 0.94736842 0.88888889] mean value: 0.9274483372264085 key: train_fscore value: [0.9921875 0.99412916 0.99215686 0.99410609 0.99607843 0.99607843 0.9921875 0.9921875 0.99607843 0.9921875 ] mean value: 0.9937377405748746 key: test_precision value: [0.8 0.93103448 0.96666667 0.82857143 0.81818182 0.92857143 0.875 0.93333333 0.93103448 0.8 ] mean value: 0.8812393640841917 key: train_precision value: [0.98449612 0.98832685 0.9844358 0.98828125 0.9921875 0.9921875 0.98449612 0.98449612 0.9921875 0.98449612] mean value: 0.9875590892038428 key: test_recall value: [1. 0.96428571 1. 1. 0.96428571 0.92857143 1. 1. 0.96428571 1. ] mean value: 0.9821428571428572 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.87931034 0.9476601 0.98214286 0.89285714 0.875 0.92857143 0.92857143 0.96428571 0.94642857 0.875 ] mean value: 0.9219827586206897 key: train_roc_auc value: [0.99209486 0.99407115 0.99212598 0.99409449 0.99606299 0.99606299 0.99212598 0.99212598 0.99606299 0.99212598] mean value: 0.9936953409479942 key: test_jcc value: [0.8 0.9 0.96666667 0.82857143 0.79411765 0.86666667 0.875 0.93333333 0.9 0.8 ] mean value: 0.8664355742296919 key: train_jcc value: [0.98449612 0.98832685 0.9844358 0.98828125 0.9921875 0.9921875 0.98449612 0.98449612 0.9921875 0.98449612] mean value: 0.9875590892038428 MCC on Blind test: 0.38 Accuracy on Blind test: 0.8 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.66994047 0.66400075 0.67747116 0.74977636 0.67632365 0.645926 0.66423178 0.66457534 0.69733119 0.67097402] mean value: 0.6780550718307495 key: score_time value: [0.00979495 0.01027083 0.01547742 0.00971317 0.0095067 0.0092721 0.01001692 0.0094285 0.01009512 0.00964975] mean value: 0.01032254695892334 key: test_mcc value: [0.96551724 1. 0.96547546 1. 0.93094934 0.96490128 0.96490128 0.96490128 0.93094934 0.89802651] mean value: 0.958562172459794 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98245614 1. 0.98245614 1. 0.96428571 0.98214286 0.98214286 0.98214286 0.96428571 0.94642857] mean value: 0.9786340852130325 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98245614 1. 0.98305085 1. 0.96551724 0.98245614 0.98245614 0.98245614 0.96551724 0.94915254] mean value: 0.9793062433992638 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96551724 1. 0.96666667 1. 0.93333333 0.96551724 0.96551724 0.96551724 0.93333333 0.90322581] mean value: 0.9598628105302188 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98275862 1. 0.98214286 1. 0.96428571 0.98214286 0.98214286 0.98214286 0.96428571 0.94642857] mean value: 0.9786330049261084 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96551724 1. 0.96666667 1. 0.93333333 0.96551724 0.96551724 0.96551724 0.93333333 0.90322581] mean value: 0.9598628105302188 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.87 Accuracy on Blind test: 0.96 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.0364995 0.05236554 0.06891942 0.04245543 0.03577256 0.04559731 0.04729986 0.04501486 0.05125642 0.03263569] mean value: 0.04578166007995606 key: score_time value: [0.02079725 0.01780081 0.01469088 0.01432419 0.01472092 0.01861978 0.02223301 0.02089095 0.01528358 0.01549315] mean value: 0.017485451698303223 key: test_mcc value: [0.9321832 0.96547546 1. 0.96547546 0.96490128 0.93094934 1. 1. 0.96490128 0.96490128] mean value: 0.9688787292752474 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96491228 0.98245614 1. 0.98245614 0.98214286 0.96428571 1. 1. 0.98214286 0.98214286] mean value: 0.9840538847117795 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96551724 0.98181818 1. 0.98305085 0.98181818 0.96296296 1. 1. 0.98181818 0.98245614] mean value: 0.9839441737605323 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.93333333 1. 1. 0.96666667 1. 1. 1. 1. 1. 0.96551724] mean value: 0.986551724137931 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96428571 1. 1. 0.96428571 0.92857143 1. 1. 0.96428571 1. ] mean value: 0.9821428571428572 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96551724 0.98214286 1. 0.98214286 0.98214286 0.96428571 1. 1. 0.98214286 0.98214286] mean value: 0.9840517241379311 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.93333333 0.96428571 1. 0.96666667 0.96428571 0.92857143 1. 1. 0.96428571 0.96551724] mean value: 0.9686945812807882 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: -0.05 Accuracy on Blind test: 0.78 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.03992987 0.04090619 0.02985525 0.03153992 0.01640797 0.01648879 0.02689624 0.02909088 0.0170362 0.01887155] mean value: 0.026702284812927246 key: score_time value: [0.02908468 0.02943969 0.02950621 0.03354526 0.01250839 0.01258111 0.02266932 0.02381325 0.01700115 0.0155673 ] mean value: 0.022571635246276856 key: test_mcc value: [0.89988258 0.8615634 0.82512315 0.93202124 0.82195294 0.96490128 0.92857143 0.89342711 0.82195294 0.83484711] mean value: 0.8784243193263463 key: train_mcc value: [0.96450468 0.95667331 0.94872473 0.95661511 0.96853396 0.95278544 0.9606597 0.96062992 0.95670033 0.95670033] mean value: 0.9582527517536223 key: test_accuracy value: [0.94736842 0.92982456 0.9122807 0.96491228 0.91071429 0.98214286 0.96428571 0.94642857 0.91071429 0.91071429] mean value: 0.937938596491228 key: train_accuracy value: [0.98224852 0.97830375 0.97435897 0.97830375 0.98425197 0.97637795 0.98031496 0.98031496 0.97834646 0.97834646] mean value: 0.9791167746043579 key: test_fscore value: [0.94915254 0.92592593 0.9122807 0.96666667 0.9122807 0.98245614 0.96428571 0.94736842 0.90909091 0.91803279] mean value: 0.9387540510139624 key: train_fscore value: [0.98224852 0.97847358 0.97425743 0.97830375 0.98431373 0.97647059 0.98039216 0.98031496 0.97830375 0.978389 ] mean value: 0.9791467451988494 key: test_precision value: [0.90322581 0.96153846 0.92857143 0.93548387 0.89655172 0.96551724 0.96428571 0.93103448 0.92592593 0.84848485] mean value: 0.9260619504501596 key: train_precision value: [0.98418972 0.97276265 0.97619048 0.97637795 0.98046875 0.97265625 0.9765625 0.98031496 0.98023715 0.97647059] mean value: 0.9776231001196349 key: test_recall value: [1. 0.89285714 0.89655172 1. 0.92857143 1. 0.96428571 0.96428571 0.89285714 1. ] mean value: 0.9539408866995074 key: train_recall value: [0.98031496 0.98425197 0.97233202 0.98023715 0.98818898 0.98031496 0.98425197 0.98031496 0.97637795 0.98031496] mean value: 0.9806899878621892 key: test_roc_auc value: [0.94827586 0.92918719 0.91256158 0.96428571 0.91071429 0.98214286 0.96428571 0.94642857 0.91071429 0.91071429] mean value: 0.9379310344827587 key: train_roc_auc value: [0.98225234 0.97829199 0.97435498 0.97830755 0.98425197 0.97637795 0.98031496 0.98031496 0.97834646 0.97834646] mean value: 0.9791159627773801 key: test_jcc value: [0.90322581 0.86206897 0.83870968 0.93548387 0.83870968 0.96551724 0.93103448 0.9 0.83333333 0.84848485] mean value: 0.8856567903731418 key: train_jcc value: [0.96511628 0.95785441 0.94980695 0.95752896 0.96911197 0.95402299 0.96153846 0.96138996 0.95752896 0.95769231] mean value: 0.9591591238303347 MCC on Blind test: 0.76 Accuracy on Blind test: 0.91 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.45232368 0.31628871 0.2526288 0.39126348 0.50805736 0.3298955 0.37645435 0.22237539 0.32188296 0.34562302] mean value: 0.35167932510375977 key: score_time value: [0.01331973 0.02620149 0.01252651 0.02661324 0.02001739 0.02062368 0.02325225 0.03817391 0.01921773 0.02543306] mean value: 0.022537899017333985 key: test_mcc value: [0.89988258 0.8615634 0.82512315 0.93202124 0.82195294 0.96490128 0.92857143 0.89342711 0.82195294 0.83484711] mean value: 0.8784243193263463 key: train_mcc value: [0.96450468 0.95667331 0.94872473 0.95661511 0.96853396 0.95278544 0.9606597 0.96062992 0.95670033 0.95670033] mean value: 0.9582527517536223 key: test_accuracy value: [0.94736842 0.92982456 0.9122807 0.96491228 0.91071429 0.98214286 0.96428571 0.94642857 0.91071429 0.91071429] mean value: 0.937938596491228 key: train_accuracy value: [0.98224852 0.97830375 0.97435897 0.97830375 0.98425197 0.97637795 0.98031496 0.98031496 0.97834646 0.97834646] mean value: 0.9791167746043579 key: test_fscore value: [0.94915254 0.92592593 0.9122807 0.96666667 0.9122807 0.98245614 0.96428571 0.94736842 0.90909091 0.91803279] mean value: 0.9387540510139624 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:188: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./embb_8020.py:191: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.98224852 0.97847358 0.97425743 0.97830375 0.98431373 0.97647059 0.98039216 0.98031496 0.97830375 0.978389 ] mean value: 0.9791467451988494 key: test_precision value: [0.90322581 0.96153846 0.92857143 0.93548387 0.89655172 0.96551724 0.96428571 0.93103448 0.92592593 0.84848485] mean value: 0.9260619504501596 key: train_precision value: [0.98418972 0.97276265 0.97619048 0.97637795 0.98046875 0.97265625 0.9765625 0.98031496 0.98023715 0.97647059] mean value: 0.9776231001196349 key: test_recall value: [1. 0.89285714 0.89655172 1. 0.92857143 1. 0.96428571 0.96428571 0.89285714 1. ] mean value: 0.9539408866995074 key: train_recall value: [0.98031496 0.98425197 0.97233202 0.98023715 0.98818898 0.98031496 0.98425197 0.98031496 0.97637795 0.98031496] mean value: 0.9806899878621892 key: test_roc_auc value: [0.94827586 0.92918719 0.91256158 0.96428571 0.91071429 0.98214286 0.96428571 0.94642857 0.91071429 0.91071429] mean value: 0.9379310344827587 key: train_roc_auc value: [0.98225234 0.97829199 0.97435498 0.97830755 0.98425197 0.97637795 0.98031496 0.98031496 0.97834646 0.97834646] mean value: 0.9791159627773801 key: test_jcc value: [0.90322581 0.86206897 0.83870968 0.93548387 0.83870968 0.96551724 0.93103448 0.9 0.83333333 0.84848485] mean value: 0.8856567903731418 key: train_jcc value: [0.96511628 0.95785441 0.94980695 0.95752896 0.96911197 0.95402299 0.96153846 0.96138996 0.95752896 0.95769231] mean value: 0.9591591238303347 MCC on Blind test: 0.76 Accuracy on Blind test: 0.91