/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_cd_7030.py:548: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 424 PASS: my_features_df and aa_df successfully combined nrows: 424 ncols: 265 count of NULL values before imputation or_mychisq 102 log10_or_mychisq 102 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML Total no. of features for aaindex: 123 No. of numerical features: 166 No. of categorical features: 7 PASS: x_features has no target variable No. of columns for x_features: 173 ------------------------------------------------------------- Successfully split data with stratification [COMPLETE data]: 70/30 Original data size: (424, 173) Train data size: (284, 173) Test data size: (140, 173) y_train numbers: Counter({1: 156, 0: 128}) y_train ratio: 0.8205128205128205 y_test_numbers: Counter({1: 77, 0: 63}) y_test ratio: 0.8181818181818182 ------------------------------------------------------------- index: 0 ind: 1 Mask count check: True Original Data Counter({1: 156, 0: 128}) Data dim: (284, 173) Simple Random OverSampling Counter({1: 156, 0: 156}) (312, 173) Simple Random UnderSampling Counter({0: 128, 1: 128}) (256, 173) Simple Combined Over and UnderSampling Counter({0: 156, 1: 156}) (312, 173) SMOTE_NC OverSampling Counter({1: 156, 0: 156}) (312, 173) ##################################################################### Running ML analysis [COMPLETE DATA]: 70/30 split Gene name: pncA Drug name: pyrazinamide Output directory: /home/tanu/git/Data/pyrazinamide/output/ml/tts_cd_7030/ Sanity checks: Total input features: 173 Training data size: (284, 173) Test data size: (140, 173) Target feature numbers (training data): Counter({1: 156, 0: 128}) Target features ratio (training data: 0.8205128205128205 Target feature numbers (test data): Counter({1: 77, 0: 63}) Target features ratio (test data): 0.8181818181818182 ##################################################################### ================================================================ Strucutral features (n): 34 These are: Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts'] FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss'] Other struc columns: ['rsa', 'kd_values', 'rd_values'] ================================================================ AAindex features (n): 123 These are: ['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'] ================================================================ Evolutionary features (n): 3 These are: ['consurf_score', 'snap2_score', 'provean_score'] ================================================================ Genomic features (n): 6 These are: ['maf', 'logorI'] ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'] ================================================================ Categorical features (n): 7 These are: ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'] ================================================================ Pass: No. of features match ##################################################################### Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03416944 0.03253078 0.03403378 0.03418827 0.03402686 0.03386068 0.03349853 0.03472042 0.03349781 0.03371382] mean value: 0.033824038505554196 key: score_time value: [0.01223898 0.01185846 0.0140183 0.01403618 0.01377988 0.01398516 0.01398301 0.01379943 0.01409292 0.01396394] mean value: 0.0135756254196167 key: test_mcc value: [0.43855669 0.51675233 0.43855669 0.6505161 0.57054433 0.17660431 0.20672456 0.85641026 0.51681139 0.12595415] mean value: 0.4497430811646085 key: train_mcc value: [0.75414242 0.72243454 0.73031782 0.69053483 0.7079253 0.75586888 0.71526337 0.71525557 0.71543078 0.71526337] mean value: 0.722243686789569 key: test_accuracy value: [0.72413793 0.75862069 0.72413793 0.82758621 0.78571429 0.60714286 0.60714286 0.92857143 0.75 0.57142857] mean value: 0.728448275862069 key: train_accuracy value: [0.87843137 0.8627451 0.86666667 0.84705882 0.85546875 0.87890625 0.859375 0.859375 0.859375 0.859375 ] mean value: 0.8626776960784314 key: test_fscore value: [0.76470588 0.77419355 0.76470588 0.84848485 0.83333333 0.68571429 0.64516129 0.93333333 0.74074074 0.64705882] mean value: 0.7637431968551514 key: train_fscore value: [0.89122807 0.87804878 0.88111888 0.8641115 0.86925795 0.88888889 0.87412587 0.875 0.87586207 0.87412587] mean value: 0.8771767886676154 key: test_precision value: [0.72222222 0.8 0.72222222 0.82352941 0.75 0.63157895 0.625 0.93333333 0.83333333 0.57894737] mean value: 0.7420166838665291 key: train_precision value: [0.87586207 0.85714286 0.8630137 0.84353741 0.86013986 0.89208633 0.86206897 0.85714286 0.85234899 0.86206897] mean value: 0.862541201224554 key: test_recall value: [0.8125 0.75 0.8125 0.875 0.9375 0.75 0.66666667 0.93333333 0.66666667 0.73333333] mean value: 0.79375 key: train_recall value: [0.90714286 0.9 0.9 0.88571429 0.87857143 0.88571429 0.88652482 0.89361702 0.90070922 0.88652482] mean value: 0.892451874366768 key: test_roc_auc value: [0.71394231 0.75961538 0.71394231 0.82211538 0.76041667 0.58333333 0.6025641 0.92820513 0.75641026 0.55897436] mean value: 0.719951923076923 key: train_roc_auc value: [0.87531056 0.85869565 0.86304348 0.84285714 0.85307882 0.87820197 0.85630589 0.85550416 0.85470244 0.85630589] mean value: 0.8594005998520496 key: test_jcc value: [0.61904762 0.63157895 0.61904762 0.73684211 0.71428571 0.52173913 0.47619048 0.875 0.58823529 0.47826087] mean value: 0.6260227775320655 key: train_jcc value: [0.80379747 0.7826087 0.7875 0.7607362 0.76875 0.8 0.77639752 0.77777778 0.7791411 0.77639752] mean value: 0.781310627345378 MCC on Blind test: 0.37 Accuracy on Blind test: 0.69 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.8561759 0.92165303 0.76126075 0.90251565 0.75767636 0.78732276 0.94697428 0.77597547 0.77366757 0.85835242] mean value: 0.8341574192047119 key: score_time value: [0.0150044 0.01200652 0.0120039 0.01198578 0.01203895 0.01211858 0.01436782 0.01202631 0.01211882 0.01197553] mean value: 0.012564659118652344 key: test_mcc value: [0.37799476 0.51675233 0.51308782 0.65896573 0.57054433 0.25819889 0.13091876 0.85641026 0.51681139 0.13091876] mean value: 0.45306030307351824 key: train_mcc value: [0.95253043 0.67476612 0.68252542 0.6267925 0.6525002 0.69995476 0.79449635 0.65192757 0.77878421 0.66766491] mean value: 0.7181942464530867 key: test_accuracy value: [0.68965517 0.75862069 0.75862069 0.82758621 0.78571429 0.64285714 0.57142857 0.92857143 0.75 0.57142857] mean value: 0.728448275862069 key: train_accuracy value: [0.97647059 0.83921569 0.84313725 0.81568627 0.828125 0.8515625 0.8984375 0.828125 0.890625 0.8359375 ] mean value: 0.8607322303921568 key: test_fscore value: [0.70967742 0.77419355 0.8 0.85714286 0.83333333 0.70588235 0.625 0.93333333 0.74074074 0.625 ] mean value: 0.7604303585233376 key: train_fscore value: [0.9787234 0.85813149 0.86013986 0.83737024 0.84507042 0.86619718 0.90909091 0.84931507 0.90277778 0.85517241] mean value: 0.876198876928773 key: test_precision value: [0.73333333 0.8 0.73684211 0.78947368 0.75 0.66666667 0.58823529 0.93333333 0.83333333 0.58823529] mean value: 0.7419453044375645 key: train_precision value: [0.97183099 0.83221477 0.84246575 0.81208054 0.83333333 0.85416667 0.89655172 0.82119205 0.88435374 0.83221477] mean value: 0.8580404325068907 key: test_recall value: [0.6875 0.75 0.875 0.9375 0.9375 0.75 0.66666667 0.93333333 0.66666667 0.66666667] mean value: 0.7870833333333334 key: train_recall value: [0.98571429 0.88571429 0.87857143 0.86428571 0.85714286 0.87857143 0.92198582 0.87943262 0.92198582 0.87943262] mean value: 0.8952836879432624 key: test_roc_auc value: [0.68990385 0.75961538 0.74519231 0.81490385 0.76041667 0.625 0.56410256 0.92820513 0.75641026 0.56410256] mean value: 0.7207852564102564 key: train_roc_auc value: [0.97546584 0.83416149 0.83928571 0.81040373 0.82512315 0.84876847 0.89577552 0.82232501 0.88707986 0.83102066] mean value: 0.8569409444214063 key: test_jcc value: [0.55 0.63157895 0.66666667 0.75 0.71428571 0.54545455 0.45454545 0.875 0.58823529 0.45454545] mean value: 0.6230312076983904 key: train_jcc value: [0.95833333 0.75151515 0.75460123 0.7202381 0.73170732 0.76397516 0.83333333 0.73809524 0.82278481 0.74698795] mean value: 0.7821571612795502 MCC on Blind test: 0.35 Accuracy on Blind test: 0.68 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01298046 0.0125916 0.00941372 0.00909805 0.01014328 0.00974369 0.01018119 0.00992966 0.01014757 0.00922632] mean value: 0.010345554351806641 key: score_time value: [0.01201987 0.00916719 0.00919127 0.00949001 0.00955749 0.00926924 0.00949407 0.00916719 0.00916576 0.00878835] mean value: 0.009531044960021972 key: test_mcc value: [0.58145719 0.1527557 0.2956562 0.36894943 0.66666667 0.55943093 0.03739788 0.78555332 0.35228194 0.35228194] mean value: 0.415243118729667 key: train_mcc value: [0.49498371 0.48735671 0.47578012 0.45997473 0.47938449 0.51209302 0.54826324 0.51261282 0.47777839 0.5254541 ] mean value: 0.49736813279152425 key: test_accuracy value: [0.79310345 0.5862069 0.65517241 0.68965517 0.82142857 0.78571429 0.53571429 0.89285714 0.67857143 0.67857143] mean value: 0.7116995073891625 key: train_accuracy value: [0.74901961 0.74117647 0.74117647 0.73333333 0.7421875 0.7578125 0.77734375 0.7578125 0.7421875 0.765625 ] mean value: 0.7507674632352941 key: test_fscore value: [0.82352941 0.64705882 0.70588235 0.72727273 0.86486486 0.82352941 0.64864865 0.90322581 0.72727273 0.72727273] mean value: 0.7598557501783308 key: train_fscore value: [0.76811594 0.79375 0.78145695 0.77631579 0.78289474 0.79605263 0.80677966 0.8 0.78431373 0.8013245 ] mean value: 0.789100394338451 key: test_precision value: [0.77777778 0.61111111 0.66666667 0.70588235 0.76190476 0.77777778 0.54545455 0.875 0.66666667 0.66666667] mean value: 0.7054908326967151 key: train_precision value: [0.77941176 0.70555556 0.72839506 0.7195122 0.72560976 0.73780488 0.77272727 0.73372781 0.72727273 0.7515528 ] mean value: 0.7381569816940069 key: test_recall value: [0.875 0.6875 0.75 0.75 1. 0.875 0.8 0.93333333 0.8 0.8 ] mean value: 0.8270833333333334 key: train_recall value: [0.75714286 0.90714286 0.84285714 0.84285714 0.85 0.86428571 0.84397163 0.87943262 0.85106383 0.85815603] mean value: 0.8496909827760891 key: test_roc_auc value: [0.78365385 0.57451923 0.64423077 0.68269231 0.79166667 0.77083333 0.51538462 0.88974359 0.66923077 0.66923077] mean value: 0.6991185897435898 key: train_roc_auc value: [0.74813665 0.72313665 0.73012422 0.72142857 0.73103448 0.74679803 0.7698119 0.74406414 0.72987974 0.75516497] mean value: 0.7399579351661555 key: test_jcc value: [0.7 0.47826087 0.54545455 0.57142857 0.76190476 0.7 0.48 0.82352941 0.57142857 0.57142857] mean value: 0.6203435302974944 key: train_jcc value: [0.62352941 0.65803109 0.64130435 0.6344086 0.64324324 0.66120219 0.67613636 0.66666667 0.64516129 0.66850829] mean value: 0.6518191486778253 MCC on Blind test: 0.38 Accuracy on Blind test: 0.69 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0094049 0.0092361 0.00938892 0.0094986 0.01036501 0.01064968 0.00943542 0.00975943 0.01038313 0.00947523] mean value: 0.009759640693664551 key: score_time value: [0.00875306 0.00907779 0.00919843 0.00939655 0.00899863 0.00959301 0.00873184 0.00939775 0.00951385 0.00956416] mean value: 0.00922250747680664 key: test_mcc value: [0.36720991 0.51675233 0.29458249 0.51308782 0.55943093 0.17660431 0.20672456 0.64084613 0.42564103 0.27928963] mean value: 0.3980169135730911 key: train_mcc value: [0.52291851 0.49871807 0.53884849 0.47460238 0.52523163 0.53332692 0.56453014 0.49226967 0.540165 0.55613108] mean value: 0.524674189335105 key: test_accuracy value: [0.68965517 0.75862069 0.65517241 0.75862069 0.78571429 0.60714286 0.60714286 0.82142857 0.71428571 0.64285714] mean value: 0.704064039408867 key: train_accuracy value: [0.76470588 0.75294118 0.77254902 0.74117647 0.765625 0.76953125 0.78515625 0.75 0.7734375 0.78125 ] mean value: 0.7656372549019608 key: test_fscore value: [0.74285714 0.77419355 0.72222222 0.8 0.82352941 0.68571429 0.64516129 0.83870968 0.73333333 0.70588235] mean value: 0.7471603264961899 key: train_fscore value: [0.79591837 0.78350515 0.80136986 0.7739726 0.79310345 0.79863481 0.80836237 0.7852349 0.80136986 0.80821918] mean value: 0.7949690558064818 key: test_precision value: [0.68421053 0.8 0.65 0.73684211 0.77777778 0.63157895 0.625 0.8125 0.73333333 0.63157895] mean value: 0.70828216374269 key: train_precision value: [0.75974026 0.75496689 0.76973684 0.74342105 0.76666667 0.76470588 0.79452055 0.74522293 0.77483444 0.78145695] mean value: 0.7655272459523916 key: test_recall value: [0.8125 0.75 0.8125 0.875 0.875 0.75 0.66666667 0.86666667 0.73333333 0.8 ] mean value: 0.7941666666666667 key: train_recall value: [0.83571429 0.81428571 0.83571429 0.80714286 0.82142857 0.83571429 0.82269504 0.82978723 0.82978723 0.83687943] mean value: 0.8269148936170213 key: test_roc_auc value: [0.67548077 0.75961538 0.63701923 0.74519231 0.77083333 0.58333333 0.6025641 0.81794872 0.71282051 0.63076923] mean value: 0.6935576923076923 key: train_roc_auc value: [0.75698758 0.74627329 0.76568323 0.73400621 0.75985222 0.76268473 0.78091274 0.74098057 0.76706753 0.77496146] mean value: 0.7589409550543877 key: test_jcc value: [0.59090909 0.63157895 0.56521739 0.66666667 0.7 0.52173913 0.47619048 0.72222222 0.57894737 0.54545455] mean value: 0.5998925838971605 key: train_jcc value: [0.66101695 0.6440678 0.66857143 0.63128492 0.65714286 0.66477273 0.67836257 0.64640884 0.66857143 0.67816092] mean value: 0.6598360435940921 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00950718 0.01232815 0.01000333 0.00981069 0.00915599 0.00976539 0.01029205 0.00983381 0.00999928 0.00985742] mean value: 0.010055327415466308 key: score_time value: [0.04921603 0.02551436 0.01085448 0.01097107 0.01046705 0.01105928 0.01091576 0.01097822 0.01087236 0.01105428] mean value: 0.016190290451049805 key: test_mcc value: [ 0.29458249 -0.11538462 0.02403846 0.29458249 0.27083333 0.25819889 0.05337605 0.65118783 -0.01571025 0.13091876] mean value: 0.18466234484517727 key: train_mcc value: [0.49947602 0.52489912 0.54682896 0.51496625 0.54964568 0.57461786 0.57748217 0.532427 0.57422132 0.55021867] mean value: 0.5444783048862579 key: test_accuracy value: [0.65517241 0.44827586 0.51724138 0.65517241 0.64285714 0.64285714 0.53571429 0.82142857 0.5 0.57142857] mean value: 0.5990147783251232 key: train_accuracy value: [0.75294118 0.76470588 0.77647059 0.76078431 0.77734375 0.7890625 0.7890625 0.76953125 0.7890625 0.77734375] mean value: 0.7746308210784314 key: test_fscore value: [0.72222222 0.5 0.5625 0.72222222 0.6875 0.70588235 0.60606061 0.84848485 0.5625 0.625 ] mean value: 0.6542372251931076 key: train_fscore value: [0.78929766 0.8013245 0.80412371 0.79322034 0.80677966 0.81879195 0.82467532 0.8013468 0.82119205 0.81188119] mean value: 0.8072633186944136 key: test_precision value: [0.65 0.5 0.5625 0.65 0.6875 0.66666667 0.55555556 0.77777778 0.52941176 0.58823529] mean value: 0.616764705882353 key: train_precision value: [0.74213836 0.74691358 0.77483444 0.75483871 0.76774194 0.7721519 0.76047904 0.76282051 0.77018634 0.75925926] mean value: 0.7611364075408015 key: test_recall value: [0.8125 0.5 0.5625 0.8125 0.6875 0.75 0.66666667 0.93333333 0.6 0.66666667] mean value: 0.6991666666666667 key: train_recall value: [0.84285714 0.86428571 0.83571429 0.83571429 0.85 0.87142857 0.90070922 0.84397163 0.87943262 0.87234043] mean value: 0.859645390070922 key: test_roc_auc value: [0.63701923 0.44230769 0.51201923 0.63701923 0.63541667 0.625 0.52564103 0.81282051 0.49230769 0.56410256] mean value: 0.5883653846153846 key: train_roc_auc value: [0.7431677 0.75388199 0.77003106 0.75263975 0.76982759 0.78054187 0.77644157 0.76111625 0.77884675 0.766605 ] mean value: 0.7653099514072751 key: test_jcc value: [0.56521739 0.33333333 0.39130435 0.56521739 0.52380952 0.54545455 0.43478261 0.73684211 0.39130435 0.45454545] mean value: 0.49418110493625367 key: train_jcc value: [0.6519337 0.66850829 0.67241379 0.65730337 0.67613636 0.69318182 0.70165746 0.66853933 0.69662921 0.68333333] mean value: 0.6769636665881136 MCC on Blind test: 0.16 Accuracy on Blind test: 0.59 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01607466 0.01531768 0.01556444 0.01464987 0.01540494 0.01492763 0.01399732 0.01559043 0.01450849 0.01498008] mean value: 0.01510155200958252 key: score_time value: [0.0113318 0.01088405 0.01098084 0.01086926 0.01097679 0.01084137 0.01012707 0.01082325 0.01090598 0.01094651] mean value: 0.010868692398071289 key: test_mcc value: [0.51308782 0.44230769 0.4444578 0.46375229 0.41079192 0.17660431 0.35143175 0.78555332 0.52084744 0.20380987] mean value: 0.4312644198027845 key: train_mcc value: [0.66873453 0.685946 0.71495683 0.67730051 0.68762435 0.77251235 0.66054755 0.68495043 0.6910643 0.68385974] mean value: 0.6927496604406052 key: test_accuracy value: [0.75862069 0.72413793 0.72413793 0.72413793 0.71428571 0.60714286 0.67857143 0.89285714 0.75 0.60714286] mean value: 0.718103448275862 key: train_accuracy value: [0.83529412 0.84313725 0.85882353 0.83921569 0.84375 0.88671875 0.83203125 0.84375 0.84375 0.84375 ] mean value: 0.8470220588235294 key: test_fscore value: [0.8 0.75 0.77777778 0.78947368 0.77777778 0.68571429 0.70967742 0.90322581 0.8 0.68571429] mean value: 0.7679361037001105 key: train_fscore value: [0.85810811 0.86577181 0.87586207 0.86195286 0.86577181 0.90034364 0.85423729 0.86486486 0.86928105 0.8630137 ] mean value: 0.8679207203181474 key: test_precision value: [0.73684211 0.75 0.7 0.68181818 0.7 0.63157895 0.6875 0.875 0.7 0.6 ] mean value: 0.7062739234449761 key: train_precision value: [0.81410256 0.8164557 0.84666667 0.81528662 0.8164557 0.86754967 0.81818182 0.82580645 0.80606061 0.83443709] mean value: 0.8261002878200331 key: test_recall value: [0.875 0.75 0.875 0.9375 0.875 0.75 0.73333333 0.93333333 0.93333333 0.8 ] mean value: 0.84625 key: train_recall value: [0.90714286 0.92142857 0.90714286 0.91428571 0.92142857 0.93571429 0.89361702 0.90780142 0.94326241 0.89361702] mean value: 0.9145440729483283 key: test_roc_auc value: [0.74519231 0.72115385 0.70673077 0.69951923 0.6875 0.58333333 0.67435897 0.88974359 0.73589744 0.59230769] mean value: 0.7035737179487179 key: train_roc_auc value: [0.82748447 0.83462733 0.85357143 0.8310559 0.83571429 0.88165025 0.82506938 0.8365094 0.83250077 0.83811286] mean value: 0.839629607688557 key: test_jcc value: [0.66666667 0.6 0.63636364 0.65217391 0.63636364 0.52173913 0.55 0.82352941 0.66666667 0.52173913] mean value: 0.6275242191738355 key: train_jcc value: [0.75147929 0.76331361 0.7791411 0.75739645 0.76331361 0.81875 0.74556213 0.76190476 0.76878613 0.75903614] mean value: 0.7668683226702581 MCC on Blind test: 0.29 Accuracy on Blind test: 0.65 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.21098614 1.15482354 1.30495906 1.1236217 1.30449224 1.08739567 1.25559568 1.10417318 1.2485714 1.14837241] mean value: 1.1942991018295288 key: score_time value: [0.01472187 0.01449966 0.01498246 0.01457214 0.01606774 0.01229334 0.01477671 0.01566529 0.01566625 0.01275349] mean value: 0.014599895477294922 key: test_mcc value: [0.44230769 0.44230769 0.37799476 0.58145719 0.41666667 0.33776026 0.29230769 0.71743483 0.51681139 0.27928963] mean value: 0.44043378142183065 key: train_mcc value: [0.97625444 0.96839557 0.98416149 0.98426071 0.97636757 0.97636757 0.97632969 0.98430987 0.96880891 0.97653632] mean value: 0.977179212771556 key: test_accuracy value: [0.72413793 0.72413793 0.68965517 0.79310345 0.71428571 0.67857143 0.64285714 0.85714286 0.75 0.64285714] mean value: 0.7216748768472907 key: train_accuracy value: [0.98823529 0.98431373 0.99215686 0.99215686 0.98828125 0.98828125 0.98828125 0.9921875 0.984375 0.98828125] mean value: 0.9886550245098039 key: test_fscore value: [0.75 0.75 0.70967742 0.82352941 0.75 0.72727273 0.64285714 0.875 0.74074074 0.70588235] mean value: 0.7474959794931332 key: train_fscore value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [0.98932384 0.9858156 0.99285714 0.9929078 0.98932384 0.98932384 0.98939929 0.99295775 0.98601399 0.98947368] mean value: 0.9897396787351177 key: test_precision value: [0.75 0.75 0.73333333 0.77777778 0.75 0.70588235 0.69230769 0.82352941 0.83333333 0.63157895] mean value: 0.744774284882644 key: train_precision value: [0.9858156 0.97887324 0.99285714 0.98591549 0.9858156 0.9858156 0.98591549 0.98601399 0.97241379 0.97916667] mean value: 0.9838602622503995 key: test_recall value: [0.75 0.75 0.6875 0.875 0.75 0.75 0.6 0.93333333 0.66666667 0.8 ] mean value: 0.75625 key: train_recall value: [0.99285714 0.99285714 0.99285714 1. 0.99285714 0.99285714 0.9929078 1. 1. 1. ] mean value: 0.9957193515704155 key: test_roc_auc value: [0.72115385 0.72115385 0.68990385 0.78365385 0.70833333 0.66666667 0.64615385 0.85128205 0.75641026 0.63076923] mean value: 0.7175480769230769 key: train_roc_auc value: [0.98773292 0.98338509 0.99208075 0.99130435 0.98780788 0.98780788 0.98775825 0.99130435 0.9826087 0.98695652] mean value: 0.9878746682889559 key: test_jcc value: [0.6 0.6 0.55 0.7 0.6 0.57142857 0.47368421 0.77777778 0.58823529 0.54545455] mean value: 0.6006580399304857 key: train_jcc value: [0.97887324 0.97202797 0.9858156 0.98591549 0.97887324 0.97887324 0.97902098 0.98601399 0.97241379 0.97916667] mean value: 0.9796994210937537 MCC on Blind test: 0.35 Accuracy on Blind test: 0.68 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02705932 0.01950145 0.02408624 0.02397561 0.01918578 0.02874064 0.01892877 0.01916695 0.0216887 0.03065681] mean value: 0.023299026489257812 key: score_time value: [0.0120914 0.00884175 0.01273632 0.00924754 0.00980282 0.00902367 0.00927806 0.00869584 0.01418424 0.01258039] mean value: 0.010648202896118165 key: test_mcc value: [0.30288462 0.51675233 0.10047962 0.68473679 0.57735027 0.6333005 0.3721042 0.72307692 0.64450339 0.64450339] mean value: 0.5199692014733809 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.65517241 0.75862069 0.55172414 0.82758621 0.78571429 0.82142857 0.67857143 0.85714286 0.82142857 0.82142857] mean value: 0.7578817733990147 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.6875 0.77419355 0.58064516 0.86486486 0.8 0.84848485 0.66666667 0.85714286 0.82758621 0.82758621] mean value: 0.773467036062976 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.6875 0.8 0.6 0.76190476 0.85714286 0.82352941 0.75 0.92307692 0.85714286 0.85714286] mean value: 0.7917439668174963 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.6875 0.75 0.5625 1. 0.75 0.875 0.6 0.8 0.8 0.8 ] mean value: 0.7625 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.65144231 0.75961538 0.55048077 0.80769231 0.79166667 0.8125 0.68461538 0.86153846 0.82307692 0.82307692] mean value: 0.7565705128205128 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.52380952 0.63157895 0.40909091 0.76190476 0.66666667 0.73684211 0.5 0.75 0.70588235 0.70588235] mean value: 0.6391657619985793 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.4 Accuracy on Blind test: 0.7 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.11053491 0.10580492 0.10976958 0.11027527 0.1128788 0.14129496 0.10483551 0.10519671 0.1059289 0.1059742 ] mean value: 0.11124937534332276 key: score_time value: [0.01880836 0.01879287 0.01911926 0.01787305 0.02713537 0.01979661 0.01765728 0.01798964 0.01790142 0.01904774] mean value: 0.01941215991973877 key: test_mcc value: [0.30288462 0.45455066 0.29458249 0.43855669 0.41666667 0.10758287 0.21483446 0.71743483 0.27754778 0.35143175] mean value: 0.3576072822776395 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.65517241 0.72413793 0.65517241 0.72413793 0.71428571 0.57142857 0.60714286 0.85714286 0.64285714 0.67857143] mean value: 0.6830049261083744 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.6875 0.73333333 0.72222222 0.76470588 0.75 0.64705882 0.62068966 0.875 0.6875 0.70967742] mean value: 0.719768733596516 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.6875 0.78571429 0.65 0.72222222 0.75 0.61111111 0.64285714 0.82352941 0.64705882 0.6875 ] mean value: 0.700749299719888 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.6875 0.6875 0.8125 0.8125 0.75 0.6875 0.6 0.93333333 0.73333333 0.73333333] mean value: 0.74375 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.65144231 0.72836538 0.63701923 0.71394231 0.70833333 0.55208333 0.60769231 0.85128205 0.63589744 0.67435897] mean value: 0.6760416666666667 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.52380952 0.57894737 0.56521739 0.61904762 0.6 0.47826087 0.45 0.77777778 0.52380952 0.55 ] mean value: 0.5666870073735062 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.25 Accuracy on Blind test: 0.64 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01011157 0.01035905 0.00959778 0.0101676 0.01034927 0.01034713 0.01018667 0.00985789 0.00959349 0.01052213] mean value: 0.010109257698059083 key: score_time value: [0.00964332 0.00898647 0.00881195 0.00940514 0.0092485 0.0096724 0.00936341 0.00948358 0.00939012 0.00940371] mean value: 0.009340858459472657 key: test_mcc value: [ 0.31579309 0.03827795 0.21932975 0.45455066 -0.10555008 0.27083333 -0.01571025 0.43589744 0.51681139 0.4241768 ] mean value: 0.2554410075983323 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.65517241 0.51724138 0.62068966 0.72413793 0.46428571 0.64285714 0.5 0.71428571 0.75 0.71428571] mean value: 0.6302955665024631 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.53333333 0.7027027 0.73333333 0.54545455 0.6875 0.5625 0.71428571 0.74074074 0.75 ] mean value: 0.6636517036517037 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.71428571 0.57142857 0.61904762 0.78571429 0.52941176 0.6875 0.52941176 0.76923077 0.83333333 0.70588235] mean value: 0.6745246175393235 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.625 0.5 0.8125 0.6875 0.5625 0.6875 0.6 0.66666667 0.66666667 0.8 ] mean value: 0.6608333333333334 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.65865385 0.51923077 0.59855769 0.72836538 0.44791667 0.63541667 0.49230769 0.71794872 0.75641026 0.70769231] mean value: 0.62625 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.36363636 0.54166667 0.57894737 0.375 0.52380952 0.39130435 0.55555556 0.58823529 0.6 ] mean value: 0.5018155120032897 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.19 Accuracy on Blind test: 0.59 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.45265174 1.38820577 1.40463042 1.39919519 1.42965603 1.43548608 1.42969394 1.42850542 1.45230269 1.45057559] mean value: 1.4270902872085571 key: score_time value: [0.09057522 0.09434032 0.09046865 0.15521455 0.09838867 0.09792662 0.09741545 0.09816027 0.09900284 0.09744287] mean value: 0.10189354419708252 key: test_mcc value: [0.58145719 0.37799476 0.4444578 0.43855669 0.48553038 0.25819889 0.42564103 0.93094934 0.57948718 0.50128041] mean value: 0.5023553662949708 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.79310345 0.68965517 0.72413793 0.72413793 0.75 0.64285714 0.71428571 0.96428571 0.78571429 0.75 ] mean value: 0.7538177339901478 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.82352941 0.70967742 0.77777778 0.76470588 0.78787879 0.70588235 0.73333333 0.96551724 0.78571429 0.75862069] mean value: 0.781263718215233 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.77777778 0.73333333 0.7 0.72222222 0.76470588 0.66666667 0.73333333 1. 0.84615385 0.78571429] mean value: 0.7729907347554406 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.875 0.6875 0.875 0.8125 0.8125 0.75 0.73333333 0.93333333 0.73333333 0.73333333] mean value: 0.7945833333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.78365385 0.68990385 0.70673077 0.71394231 0.73958333 0.625 0.71282051 0.96666667 0.78974359 0.75128205] mean value: 0.7479326923076923 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value:/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [0.7 0.55 0.63636364 0.61904762 0.65 0.54545455 0.57894737 0.93333333 0.64705882 0.61111111] mean value: 0.647131643726071 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.46 Accuracy on Blind test: 0.74 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: fit_time value: [1.7486918 0.90479875 0.92124152 0.9011457 0.96960497 0.88509083 0.91266322 0.88183689 0.89421272 0.90750265] mean value: 0.9926789045333863 key: score_time value: [0.19227123 0.21386743 0.26786327 0.23550534 0.25344205 0.12532926 0.20549893 0.20171428 0.23566985 0.26774335] mean value: 0.21989049911499023 key: test_mcc value: [0.50973276 0.51675233 0.72435769 0.4444578 0.55943093 0.25819889 0.42564103 0.93094934 0.64450339 0.42564103] mean value: 0.5439665164350693 key: train_mcc value: [0.88962581 0.88962581 0.88144491 0.88193307 0.90602026 0.889773 0.90584149 0.88995933 0.88180723 0.88155407] mean value: 0.889758497328895 key: test_accuracy value: [0.75862069 0.75862069 0.86206897 0.72413793 0.78571429 0.64285714 0.71428571 0.96428571 0.82142857 0.71428571] mean value: 0.7746305418719212 key: train_accuracy value: [0.94509804 0.94509804 0.94117647 0.94117647 0.953125 0.9453125 0.953125 0.9453125 0.94140625 0.94140625] mean value: 0.9452236519607843 key: test_fscore value: [0.78787879 0.77419355 0.88235294 0.77777778 0.82352941 0.70588235 0.73333333 0.96551724 0.82758621 0.73333333] mean value: 0.8011384934868544 key: train_fscore value: [0.95104895 0.95104895 0.94736842 0.94773519 0.95804196 0.95070423 0.95833333 0.95138889 0.94773519 0.94736842] mean value: 0.951077353309472 key: test_precision value: [0.76470588 0.8 0.83333333 0.7 0.77777778 0.66666667 0.73333333 1. 0.85714286 0.73333333] mean value: 0.7866293183940243 key: train_precision value: [0.93150685 0.93150685 0.93103448 0.92517007 0.93835616 0.9375 0.93877551 0.93197279 0.93150685 0.9375 ] mean value: 0.9334829562434327 key: test_recall value: [0.8125 0.75 0.9375 0.875 0.875 0.75 0.73333333 0.93333333 0.8 0.73333333] mean value: 0.82 key: train_recall value: [0.97142857 0.97142857 0.96428571 0.97142857 0.97857143 0.96428571 0.9787234 0.97163121 0.96453901 0.95744681] mean value: 0.9693768996960486 key: test_roc_auc value: [0.75240385 0.75961538 0.85336538 0.70673077 0.77083333 0.625 0.71282051 0.96666667 0.82307692 0.71282051] mean value: 0.7683333333333333 key: train_roc_auc value: [0.94223602 0.94223602 0.9386646 0.9378882 0.95049261 0.94334975 0.95023127 0.94233734 0.93879124 0.93959297] mean value: 0.9425820030714127 key: test_jcc value: [0.65 0.63157895 0.78947368 0.63636364 0.7 0.54545455 0.57894737 0.93333333 0.70588235 0.57894737] mean value: 0.6749981236513745 key: train_jcc value: [0.90666667 0.90666667 0.9 0.90066225 0.91946309 0.90604027 0.92 0.90728477 0.90066225 0.9 ] mean value: 0.906744596056121 MCC on Blind test: 0.51 Accuracy on Blind test: 0.76 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01031065 0.01042509 0.00919986 0.01029253 0.00933981 0.00958133 0.01031375 0.0097096 0.00932622 0.00971889] mean value: 0.009821772575378418 key: score_time value: [0.00951123 0.01605248 0.00885272 0.00941873 0.00853109 0.00870562 0.00875068 0.00936747 0.00857401 0.00866628] mean value: 0.009643030166625977 key: test_mcc value: [0.36720991 0.51675233 0.29458249 0.51308782 0.55943093 0.17660431 0.20672456 0.64084613 0.42564103 0.27928963] mean value: 0.3980169135730911 key: train_mcc value: [0.52291851 0.49871807 0.53884849 0.47460238 0.52523163 0.53332692 0.56453014 0.49226967 0.540165 0.55613108] mean value: 0.524674189335105 key: test_accuracy value: [0.68965517 0.75862069 0.65517241 0.75862069 0.78571429 0.60714286 0.60714286 0.82142857 0.71428571 0.64285714] mean value: 0.704064039408867 key: train_accuracy value: [0.76470588 0.75294118 0.77254902 0.74117647 0.765625 0.76953125 0.78515625 0.75 0.7734375 0.78125 ] mean value: 0.7656372549019608 key: test_fscore value: [0.74285714 0.77419355 0.72222222 0.8 0.82352941 0.68571429 0.64516129 0.83870968 0.73333333 0.70588235] mean value: 0.7471603264961899 key: train_fscore value: [0.79591837 0.78350515 0.80136986 0.7739726 0.79310345 0.79863481 0.80836237 0.7852349 0.80136986 0.80821918] mean value: 0.7949690558064818 key: test_precision value: [0.68421053 0.8 0.65 0.73684211 0.77777778 0.63157895 0.625 0.8125 0.73333333 0.63157895] mean value: 0.70828216374269 key: train_precision value: [0.75974026 0.75496689 0.76973684 0.74342105 0.76666667 0.76470588 0.79452055 0.74522293 0.77483444 0.78145695] mean value: 0.7655272459523916 key: test_recall value: [0.8125 0.75 0.8125 0.875 0.875 0.75 0.66666667 0.86666667 0.73333333 0.8 ] mean value: 0.7941666666666667 key: train_recall value: [0.83571429 0.81428571 0.83571429 0.80714286 0.82142857 0.83571429 0.82269504 0.82978723 0.82978723 0.83687943] mean value: 0.8269148936170213 key: test_roc_auc value: [0.67548077 0.75961538 0.63701923 0.74519231 0.77083333 0.58333333 0.6025641 0.81794872 0.71282051 0.63076923] mean value: 0.6935576923076923 key: train_roc_auc value: [0.75698758 0.74627329 0.76568323 0.73400621 0.75985222 0.76268473 0.78091274 0.74098057 0.76706753 0.77496146] mean value: 0.7589409550543877 key: test_jcc value: [0.59090909 0.63157895 0.56521739 0.66666667 0.7 0.52173913 0.47619048 0.72222222 0.57894737 0.54545455] mean value: 0.5998925838971605 key: train_jcc value: [0.66101695 0.6440678 0.66857143 0.63128492 0.65714286 0.66477273 0.67836257 0.64640884 0.66857143 0.67816092] mean value: 0.6598360435940921 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.11305761 0.07152939 0.07408118 0.0680635 0.06910729 0.07664919 0.07117009 0.09131765 0.06809759 0.07645154] mean value: 0.07795250415802002 key: score_time value: [0.01137042 0.0111258 0.01030493 0.01230121 0.01032948 0.01086855 0.01022887 0.01089406 0.01038265 0.0107193 ] mean value: 0.010852527618408204 key: test_mcc value: [0.44230769 0.50973276 0.51675233 0.6505161 0.48553038 0.64019064 0.43589744 0.74885534 0.64450339 0.66151858] mean value: 0.5735804639693557 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.72413793 0.75862069 0.75862069 0.82758621 0.75 0.82142857 0.71428571 0.85714286 0.82142857 0.82142857] mean value: 0.7854679802955665 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.75 0.78787879 0.77419355 0.84848485 0.78787879 0.85714286 0.71428571 0.84615385 0.82758621 0.81481481] mean value: 0.8008419411923305 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 0.76470588 0.8 0.82352941 0.76470588 0.78947368 0.76923077 1. 0.85714286 0.91666667] mean value: 0.8235455153721407 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 0.8125 0.75 0.875 0.8125 0.9375 0.66666667 0.73333333 0.8 0.73333333] mean value: 0.7870833333333334 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.72115385 0.75240385 0.75961538 0.82211538 0.73958333 0.80208333 0.71794872 0.86666667 0.82307692 0.82820513] mean value: 0.7832852564102564 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.6 0.65 0.63157895 0.73684211 0.65 0.75 0.55555556 0.73333333 0.70588235 0.6875 ] mean value: 0.6700692294461644 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.55 Accuracy on Blind test: 0.78 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04733849 0.04470849 0.05614376 0.0619967 0.06084156 0.06722283 0.06452966 0.06317711 0.07480168 0.04873967] mean value: 0.05894999504089356 key: score_time value: [0.01197529 0.01792359 0.02341628 0.01972723 0.01569057 0.01621079 0.02333379 0.02047038 0.02111411 0.01213336] mean value: 0.01819953918457031 key: test_mcc value: [0.2956562 0.23923719 0.36720991 0.44230769 0.33113309 0.55943093 0.28205128 0.57080582 0.43589744 0.05337605] mean value: 0.3577105589280532 key: train_mcc value: [0.87333821 0.84147447 0.80993789 0.8573396 0.84231844 0.87490348 0.88155407 0.84231285 0.8577884 0.86572261] mean value: 0.8546690004567509 key: test_accuracy value: [0.65517241 0.62068966 0.68965517 0.72413793 0.67857143 0.78571429 0.64285714 0.78571429 0.71428571 0.53571429] mean value: 0.6832512315270935 key: train_accuracy value: [0.9372549 0.92156863 0.90588235 0.92941176 0.921875 0.9375 0.94140625 0.921875 0.9296875 0.93359375] mean value: 0.9280055147058823 key: test_fscore value: [0.70588235 0.64516129 0.74285714 0.75 0.74285714 0.82352941 0.66666667 0.8125 0.71428571 0.60606061] mean value: 0.7209800327755735 key: train_fscore value: [0.94366197 0.92907801 0.91428571 0.93617021 0.92957746 0.94444444 0.94736842 0.93055556 0.93661972 0.93992933] mean value: 0.9351690845840186 key: test_precision value: [0.66666667 0.66666667 0.68421053 0.75 0.68421053 0.77777778 0.66666667 0.76470588 0.76923077 0.55555556] mean value: 0.6985691037548623 key: train_precision value: [0.93055556 0.92253521 0.91428571 0.92957746 0.91666667 0.91891892 0.9375 0.91156463 0.93006993 0.93661972] mean value: 0.9248293805713322 key: test_recall value: [0.75 0.625 0.8125 0.75 0.8125 0.875 0.66666667 0.86666667 0.66666667 0.66666667] mean value: 0.7491666666666666 key: train_recall value: [0.95714286 0.93571429 0.91428571 0.94285714 0.94285714 0.97142857 0.95744681 0.95035461 0.94326241 0.94326241] mean value: 0.9458611955420466 key: test_roc_auc value: [0.64423077 0.62019231 0.67548077 0.72115385 0.65625 0.77083333 0.64102564 0.77948718 0.71794872 0.52564103] mean value: 0.675224358974359 key: train_roc_auc value: [0.93509317 0.92003106 0.90496894 0.92795031 0.91970443 0.93399015 0.93959297 0.91865557 0.92815294 0.93250077] mean value: 0.9260640310543817 key: test_jcc value: [0.54545455 0.47619048 0.59090909 0.6 0.59090909 0.7 0.5 0.68421053 0.55555556 0.43478261] mean value: 0.56780118940302 key: train_jcc value: [0.89333333 0.86754967 0.84210526 0.88 0.86842105 0.89473684 0.9 0.87012987 0.8807947 0.88666667] mean value: 0.8783737398885534 MCC on Blind test: 0.4 Accuracy on Blind test: 0.7 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.04268551 0.00933933 0.00895882 0.00897408 0.00992012 0.00953245 0.00947046 0.00959587 0.00973368 0.01020265] mean value: 0.012841296195983887 key: score_time value: [0.00958109 0.00871778 0.00853181 0.00853419 0.00926518 0.00934649 0.00859356 0.00852966 0.00941658 0.00943685] mean value: 0.008995318412780761 key: test_mcc value: [ 0.43855669 0.03827795 0.36894943 0.50973276 0.71004695 0.41666667 -0.02738134 0.78555332 0.43262512 0.13091876] mean value: 0.38039463122020395 key: train_mcc value: [0.49076747 0.48287166 0.45854085 0.42616368 0.46985293 0.46939037 0.50035102 0.46804282 0.49226675 0.48419112] mean value: 0.47424386837888466 key: test_accuracy value: [0.72413793 0.51724138 0.68965517 0.75862069 0.85714286 0.71428571 0.5 0.89285714 0.71428571 0.57142857] mean value: 0.6939655172413793 key: train_accuracy value: [0.74901961 0.74509804 0.73333333 0.71764706 0.73828125 0.73828125 0.75390625 0.73828125 0.75 0.74609375] mean value: 0.7409941789215686 key: test_fscore value: [0.76470588 0.53333333 0.72727273 0.78787879 0.88235294 0.75 0.58823529 0.90322581 0.76470588 0.625 ] mean value: 0.7326710654936461 key: train_fscore value: [0.77931034 0.77508651 0.76712329 0.75675676 0.77591973 0.76975945 0.78350515 0.77441077 0.78082192 0.778157 ] mean value: 0.774085092050438 key: test_precision value: [0.72222222 0.57142857 0.70588235 0.76470588 0.83333333 0.75 0.52631579 0.875 0.68421053 0.58823529] mean value: 0.7021333972185365 key: train_precision value: [0.75333333 0.75167785 0.73684211 0.71794872 0.72955975 0.74172185 0.76 0.73717949 0.75496689 0.75 ] mean value: 0.7433229986223217 key: test_recall value: [0.8125 0.5 0.75 0.8125 0.9375 0.75 0.66666667 0.93333333 0.86666667 0.66666667] mean value: 0.7695833333333333 key: train_recall value: [0.80714286 0.8 0.8 0.8 0.82857143 0.8 0.80851064 0.81560284 0.80851064 0.80851064] mean value: 0.8076849037487336 key: test_roc_auc value: [0.71394231 0.51923077 0.68269231 0.75240385 0.84375 0.70833333 0.48717949 0.88974359 0.7025641 0.56410256] mean value: 0.6863942307692308 key: train_roc_auc value: [0.74270186 0.73913043 0.72608696 0.70869565 0.72894089 0.73189655 0.74773358 0.72954055 0.74338575 0.73903793] mean value: 0.7337150155925077 key: test_jcc value: [0.61904762 0.36363636 0.57142857 0.65 0.78947368 0.6 0.41666667 0.82352941 0.61904762 0.45454545] mean value: 0.5907375390347527 key: train_jcc value: [0.63841808 0.63276836 0.62222222 0.60869565 0.63387978 0.62569832 0.6440678 0.63186813 0.64044944 0.63687151] mean value: 0.631493929557765 MCC on Blind test: 0.31 Accuracy on Blind test: 0.66 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01471472 0.01537204 0.0144701 0.01906157 0.01802874 0.01486659 0.01644754 0.0162828 0.0146687 0.01574802] mean value: 0.015966081619262697 key: score_time value: [0.00864053 0.01096821 0.01096869 0.01156044 0.01153564 0.01147151 0.01165247 0.01153016 0.01147318 0.01150846] mean value: 0.011130928993225098 key: test_mcc value: [0.2956562 0.4444578 0.44230769 0.72115385 0.4330127 0.27083333 0.27174649 0.85641026 0.45479403 0.35805744] mean value: 0.4548429779078069 key: train_mcc value: [0.78686712 0.58530546 0.73027531 0.74166179 0.71577977 0.7553501 0.7069685 0.77942077 0.531924 0.69196476] mean value: 0.702551757572427 key: test_accuracy value: [0.65517241 0.72413793 0.72413793 0.86206897 0.71428571 0.64285714 0.60714286 0.92857143 0.71428571 0.67857143] mean value: 0.7251231527093596 key: train_accuracy value: [0.89019608 0.76862745 0.86666667 0.85882353 0.8515625 0.87890625 0.83203125 0.890625 0.73828125 0.83203125] mean value: 0.8407751225490196 key: test_fscore value: [0.70588235 0.77777778 0.75 0.875 0.73333333 0.6875 0.52173913 0.93333333 0.77777778 0.68965517] mean value: 0.7451998878011974 key: train_fscore value: [0.90728477 0.8259587 0.88028169 0.856 0.85271318 0.89122807 0.82157676 0.9 0.80802292 0.82730924] mean value: 0.8570375331957046 key: test_precision value: [0.66666667 0.7 0.75 0.875 0.78571429 0.6875 0.75 0.93333333 0.66666667 0.71428571] mean value: 0.7529166666666667 key: train_precision value: [0.84567901 0.70351759 0.86805556 0.97272727 0.93220339 0.87586207 0.99 0.90647482 0.67788462 0.9537037 ] mean value: 0.8726108026596435 key: test_recall value: [0.75 0.875 0.75 0.875 0.6875 0.6875 0.4 0.93333333 0.93333333 0.66666667] mean value: 0.7558333333333334 key: train_recall value: [0.97857143 1. 0.89285714 0.76428571 0.78571429 0.90714286 0.70212766 0.89361702 1. 0.73049645] mean value: 0.8654812563323202 key: test_roc_auc value: [0.64423077 0.70673077 0.72115385 0.86057692 0.71875 0.63541667 0.62307692 0.92820513 0.6974359 0.67948718] mean value: 0.7215064102564103 key: train_roc_auc value: [0.88059006 0.74347826 0.86381988 0.86909938 0.85837438 0.87598522 0.846716 0.89028677 0.70869565 0.8435091 ] mean value: 0.8380554707448707 key: test_jcc value: [0.54545455 0.63636364 0.6 0.77777778 0.57894737 0.52380952 0.35294118 0.875 0.63636364 0.52631579] mean value: 0.6052973454134445 key: train_jcc value: [0.83030303 0.70351759 0.78616352 0.74825175 0.74324324 0.80379747 0.6971831 0.81818182 0.67788462 0.70547945] mean value: 0.7514005584317507 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01621699 0.01477242 0.01502943 0.01693106 0.01710677 0.01634216 0.01618671 0.01679707 0.01618743 0.01548648] mean value: 0.01610565185546875 key: score_time value: [0.01161408 0.01149511 0.01152277 0.01150894 0.01155853 0.01150775 0.01153326 0.01155472 0.01158786 0.01207876] mean value: 0.011596179008483887 key: test_mcc value: [0.46375229 0.2956562 0.5943331 0.6505161 0.41666667 0.33113309 0.22739701 0.69388867 0.4241768 0.27928963] mean value: 0.43768095519341854 key: train_mcc value: [0.57308036 0.71593148 0.66101414 0.77134643 0.81121707 0.81285468 0.75895878 0.69508372 0.68195933 0.68578508] mean value: 0.7167231067913732 key: test_accuracy value: [0.72413793 0.65517241 0.79310345 0.82758621 0.71428571 0.67857143 0.60714286 0.82142857 0.71428571 0.64285714] mean value: 0.7178571428571429 key: train_accuracy value: [0.76470588 0.85490196 0.82745098 0.88627451 0.90625 0.90625 0.87109375 0.8359375 0.828125 0.8359375 ] mean value: 0.8516927083333333 key: test_fscore value: [0.78947368 0.70588235 0.83333333 0.84848485 0.75 0.74285714 0.59259259 0.8 0.75 0.70588235] mean value: 0.7518506307360797 key: train_fscore value: [0.82248521 0.87868852 0.85714286 0.90034364 0.91366906 0.91780822 0.87159533 0.83333333 0.86419753 0.86708861] mean value: 0.8726352317903348 key: test_precision value: [0.68181818 0.66666667 0.75 0.82352941 0.75 0.68421053 0.66666667 1. 0.70588235 0.63157895] mean value: 0.7360352753541608 key: train_precision value: [0.7020202 0.81212121 0.78571429 0.86754967 0.92028986 0.88157895 0.96551724 0.94594595 0.76502732 0.78285714] mean value: 0.8428621823757527 key: test_recall value: [0.9375 0.75 0.9375 0.875 0.75 0.8125 0.53333333 0.66666667 0.8 0.8 ] mean value: 0.78625 key: train_recall value: [0.99285714 0.95714286 0.94285714 0.93571429 0.90714286 0.95714286 0.79432624 0.74468085 0.9929078 0.97163121] mean value: 0.9196403242147924 key: test_roc_auc value: [0.69951923 0.64423077 0.77644231 0.82211538 0.70833333 0.65625 0.61282051 0.83333333 0.70769231 0.63076923] mean value: 0.709150641025641 key: train_roc_auc value: [0.73990683 0.84378882 0.81490683 0.88090062 0.90615764 0.90098522 0.87977182 0.84625347 0.80949738 0.82059821] mean value: 0.8442766838465265 key: test_jcc value: [0.65217391 0.54545455 0.71428571 0.73684211 0.6 0.59090909 0.42105263 0.66666667 0.6 0.54545455] mean value: 0.6072839212656146 key: train_jcc value: [0.69849246 0.78362573 0.75 0.81875 0.8410596 0.84810127 0.77241379 0.71428571 0.76086957 0.76536313] mean value: 0.7752961262875675 MCC on Blind test: 0.24 Accuracy on Blind test: 0.62 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.14341903 0.12807274 0.13190699 0.12588644 0.12666917 0.1267252 0.1269629 0.12768006 0.1299634 0.13459158] mean value: 0.1301877498626709 key: score_time value: [0.01633096 0.01517463 0.01501942 0.01492524 0.01512504 0.0148921 0.01571321 0.01498175 0.01585507 0.01626587] mean value: 0.01542832851409912 key: test_mcc value: [0.6505161 0.37799476 0.6505161 0.43855669 0.40881491 0.33113309 0.43589744 0.69388867 0.4555973 0.35143175] mean value: 0.4794346794243034 key: train_mcc value: [0.99210575 1. 0.98426071 0.99211795 0.99214326 0.99214326 1. 0.98430987 0.9921307 0.98430987] mean value: 0.991352136995068 key: test_accuracy value: [0.82758621 0.68965517 0.82758621 0.72413793 0.71428571 0.67857143 0.71428571 0.82142857 0.71428571 0.67857143] mean value: 0.739039408866995 key: train_accuracy value: [0.99607843 1. 0.99215686 0.99607843 0.99609375 0.99609375 1. 0.9921875 0.99609375 0.9921875 ] mean value: 0.9956969975490196 key: test_fscore value: [0.84848485 0.70967742 0.84848485 0.76470588 0.76470588 0.74285714 0.71428571 0.8 0.69230769 0.70967742] mean value: 0.7595186849835807 key: train_fscore value: [0.99644128 1. 0.9929078 0.99641577 0.99644128 0.99644128 1. 0.99295775 0.99646643 0.99295775] mean value: 0.9961029339497282 key: test_precision value: [0.82352941 0.73333333 0.82352941 0.72222222 0.72222222 0.68421053 0.76923077 1. 0.81818182 0.6875 ] mean value: 0.7783959715035567 key: train_precision value: [0.9929078 1. 0.98591549 1. 0.9929078 0.9929078 1. 0.98601399 0.99295775 0.98601399] mean value: 0.9929624615719911 key: test_recall value: [0.875 0.6875 0.875 0.8125 0.8125 0.8125 0.66666667 0.66666667 0.6 0.73333333] mean value: 0.7541666666666667 key: train_recall value: [1. 1. 1. 0.99285714 1. 1. 1. 1. 1. 1. ] mean value: 0.9992857142857143 key: test_roc_auc value: [0.82211538 0.68990385 0.82211538 0.71394231 0.69791667 0.65625 0.71794872 0.83333333 0.72307692 0.67435897] mean value: 0.7350961538461538 key: train_roc_auc value: [0.99565217 1. 0.99130435 0.99642857 0.99568966 0.99568966 1. 0.99130435 0.99565217 0.99130435] mean value: 0.9953025273077747 key: test_jcc value: [0.73684211 0.55 0.73684211 0.61904762 0.61904762 0.59090909 0.55555556 0.66666667 0.52941176 0.55 ] mean value: 0.615432252645875 key: train_jcc value: [0.9929078 1. 0.98591549 0.99285714 0.9929078 0.9929078 1. 0.98601399 0.99295775 0.98601399] mean value: 0.9922481758577054 MCC on Blind test: 0.48 Accuracy on Blind test: 0.74 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.04842234 0.05589557 0.07119823 0.05455518 0.04850173 0.06984067 0.06670141 0.06200743 0.05922127 0.05085707] mean value: 0.05872008800506592 key: score_time value: [0.01725745 0.0303607 0.02712083 0.02353597 0.02750874 0.03366137 0.03804755 0.0257113 0.02761889 0.02240229] mean value: 0.02732250690460205 key: test_mcc value: [0.58173077 0.51675233 0.61653391 0.58145719 0.5625 0.55943093 0.50128041 0.69388867 0.66151858 0.64450339] mean value: 0.591959617459089 key: train_mcc value: [0.98430913 0.97629123 0.98430913 0.97657181 0.96065003 0.98438167 0.97664764 0.97664764 0.95392353 0.97664764] mean value: 0.9750379441587029 key: test_accuracy value: [0.79310345 0.75862069 0.79310345 0.79310345 0.78571429 0.78571429 0.75 0.82142857 0.82142857 0.82142857] mean value: 0.7923645320197045 key: train_accuracy value: [0.99215686 0.98823529 0.99215686 0.98823529 0.98046875 0.9921875 0.98828125 0.98828125 0.9765625 0.98828125] mean value: 0.9874846813725491 key: test_fscore value: [0.8125 0.77419355 0.78571429 0.82352941 0.8125 0.82352941 0.75862069 0.8 0.81481481 0.82758621] mean value: 0.8032988368997334 key: train_fscore value: [0.99280576 0.98924731 0.99280576 0.98916968 0.98207885 0.99280576 0.98924731 0.98924731 0.97826087 0.98924731] mean value: 0.9884915911200943 key: test_precision value: [0.8125 0.8 0.91666667 0.77777778 0.8125 0.77777778 0.78571429 1. 0.91666667 0.85714286] mean value: 0.8456746031746032 key: train_precision value: [1. 0.99280576 1. 1. 0.98561151 1. 1. 1. 1. 1. ] mean value: 0.9978417266187051 key: test_recall value: [0.8125 0.75 0.6875 0.875 0.8125 0.875 0.73333333 0.66666667 0.73333333 0.8 ] mean value: 0.7745833333333333 key: train_recall value: [0.98571429 0.98571429 0.98571429 0.97857143 0.97857143 0.98571429 0.9787234 0.9787234 0.95744681 0.9787234 ] mean value: 0.9793617021276596 key: test_roc_auc value: [0.79086538 0.75961538 0.80528846 0.78365385 0.78125 0.77083333 0.75128205 0.83333333 0.82820513 0.82307692] mean value: 0.7927403846153847 key: train_roc_auc value: [0.99285714 0.98850932 0.99285714 0.98928571 0.98066502 0.99285714 0.9893617 0.9893617 0.9787234 0.9893617 ] mean value: 0.9883839994896169 key: test_jcc value: [0.68421053 0.63157895 0.64705882 0.7 0.68421053 0.7 0.61111111 0.66666667 0.6875 0.70588235] mean value: 0.6718218954248366 key: train_jcc value: [0.98571429 0.9787234 0.98571429 0.97857143 0.96478873 0.98571429 0.9787234 0.9787234 0.95744681 0.9787234 ] mean value: 0.9772843443640566 MCC on Blind test: 0.51 Accuracy on Blind test: 0.75 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.06892204 0.04836154 0.03385496 0.03343391 0.08310866 0.0448885 0.03518319 0.07023501 0.04377747 0.04133487] mean value: 0.05031001567840576 key: score_time value: [0.0234127 0.01284838 0.01282406 0.01289606 0.02261782 0.01286912 0.01278472 0.02182031 0.01295614 0.01290345] mean value: 0.015793275833129884 key: test_mcc value: [0.29458249 0.37799476 0.13968442 0.36720991 0.41666667 0.5625 0.20282899 0.71743483 0.36232865 0.35228194] mean value: 0.3793512667922323 key: train_mcc value: [0.99210575 0.99210575 0.98416149 1. 0.98433579 0.99214326 0.9921307 0.9921307 0.98430987 0.98430987] mean value: 0.9897733186760725 key: test_accuracy value: [0.65517241 0.68965517 0.5862069 0.68965517 0.71428571 0.78571429 0.60714286 0.85714286 0.67857143 0.67857143] mean value: 0.6942118226600985 key: train_accuracy value: [0.99607843 0.99607843 0.99215686 1. 0.9921875 0.99609375 0.99609375 0.99609375 0.9921875 0.9921875 ] mean value: 0.9949157475490196 key: test_fscore value: [0.72222222 0.70967742 0.68421053 0.74285714 0.75 0.8125 0.66666667 0.875 0.74285714 0.72727273] mean value: 0.743326384754653 key: train_fscore value: [0.99644128 0.99644128 0.99285714 1. 0.9929078 0.99644128 0.99646643 0.99646643 0.99295775 0.99295775] mean value: 0.9953937142840512 key: test_precision value: [0.65 0.73333333 0.59090909 0.68421053 0.75 0.8125 0.61111111 0.82352941 0.65 0.66666667] mean value: 0.6972260140100698 key: train_precision value: [0.9929078 0.9929078 0.99285714 1. 0.98591549 0.9929078 0.99295775 0.99295775 0.98601399 0.98601399] mean value: 0.9915439505055927 key: test_recall value: [0.8125 0.6875 0.8125 0.8125 0.75 0.8125 0.73333333 0.93333333 0.86666667 0.8 ] mean value: 0.8020833333333334 key: train_recall value: [1. 1. 0.99285714 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9992857142857143 key: test_roc_auc value: [0.63701923 0.68990385 0.56009615 0.67548077 0.70833333 0.78125 0.5974359 0.85128205 0.66410256 0.66923077] mean value: 0.6834134615384615 key: train_roc_auc value: [0.99565217 0.99565217 0.99208075 1. 0.99137931 0.99568966 0.99565217 0.99565217 0.99130435 0.99130435] mean value: 0.9944367102163204 key: test_jcc value: [0.56521739 0.55 0.52 0.59090909 0.6 0.68421053 0.5 0.77777778 0.59090909 0.57142857] mean value: 0.5950452448644669 key: train_jcc value: [0.9929078 0.9929078 0.9858156 1. 0.98591549 0.9929078 0.99295775 0.99295775 0.98601399 0.98601399] mean value: 0.9908397965035664 MCC on Blind test: 0.25 Accuracy on Blind test: 0.64 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.47307181 0.44717479 0.45278549 0.45123696 0.45926499 0.45636559 0.45906496 0.45645404 0.44945335 0.45685482] mean value: 0.45617268085479734 key: score_time value: [0.00952506 0.00908566 0.00928783 0.0093286 0.009516 0.00935721 0.00914192 0.00926089 0.01024389 0.00966859] mean value: 0.009441566467285157 key: test_mcc value: [0.6505161 0.51675233 0.58173077 0.6505161 0.40881491 0.64019064 0.50128041 0.72307692 0.57948718 0.51681139] mean value: 0.5769176744958047 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.82758621 0.75862069 0.79310345 0.82758621 0.71428571 0.82142857 0.75 0.85714286 0.78571429 0.75 ] mean value: 0.7885467980295566 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.84848485 0.77419355 0.8125 0.84848485 0.76470588 0.85714286 0.75862069 0.85714286 0.78571429 0.74074074] mean value: 0.8047730558105648 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.82352941 0.8 0.8125 0.82352941 0.72222222 0.78947368 0.78571429 0.92307692 0.84615385 0.83333333] mean value: 0.8159533118240548 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.875 0.75 0.8125 0.875 0.8125 0.9375 0.73333333 0.8 0.73333333 0.66666667] mean value: 0.7995833333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.82211538 0.75961538 0.79086538 0.82211538 0.69791667 0.80208333 0.75128205 0.86153846 0.78974359 0.75641026] mean value: 0.7853685897435897 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.73684211 0.63157895 0.68421053 0.73684211 0.61904762 0.75 0.61111111 0.75 0.64705882 0.58823529] mean value: 0.6754926532016315 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.6 Accuracy on Blind test: 0.8 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.02638793 0.02276492 0.02334905 0.02273154 0.03356719 0.02284646 0.02296495 0.02272868 0.02332139 0.02315092] mean value: 0.024381303787231447 key: score_time value: [0.01213121 0.01963043 0.01413655 0.01541519 0.01413393 0.01488996 0.01563334 0.01552153 0.01470804 0.01829457] mean value: 0.01544947624206543 key: test_mcc value: [0.39546094 0.36894943 0.221332 0.14470719 0.09128709 0.4 0.12687831 0.5859606 0.20380987 0.12403473] mean value: 0.2662420159139888 key: train_mcc value: [0.63202573 0.5986817 0.66534784 0.65201286 0.61444446 0.60121238 0.66594163 0.57250836 0.57922054 0.57922054] mean value: 0.616061603996789 key: test_accuracy value: [0.68965517 0.68965517 0.62068966 0.5862069 0.57142857 0.67857143 0.57142857 0.78571429 0.60714286 0.57142857] mean value: 0.6371921182266009 key: train_accuracy value: [0.79607843 0.77647059 0.81568627 0.80784314 0.78515625 0.77734375 0.81640625 0.76171875 0.765625 0.765625 ] mean value: 0.7867953431372549 key: test_fscore value: [0.76923077 0.72727273 0.68571429 0.66666667 0.66666667 0.7804878 0.68421053 0.82352941 0.68571429 0.66666667] mean value: 0.7156159810890612 key: train_fscore value: [0.84337349 0.83086053 0.85626911 0.85106383 0.8358209 0.83086053 0.85714286 0.82215743 0.8245614 0.8245614 ] mean value: 0.8376671499247365 key: test_precision value: [0.65217391 0.70588235 0.63157895 0.6 0.6 0.64 0.56521739 0.73684211 0.6 0.57142857] mean value: 0.6303123281349152 key: train_precision value: [0.72916667 0.7106599 0.7486631 0.74074074 0.71794872 0.7106599 0.75 0.6980198 0.70149254 0.70149254] mean value: 0.7208843900521782 key: test_recall value: [0.9375 0.75 0.75 0.75 0.75 1. 0.86666667 0.93333333 0.8 0.8 ] mean value: 0.83375 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.66105769 0.68269231 0.60576923 0.56730769 0.54166667 0.625 0.54871795 0.77435897 0.59230769 0.55384615] mean value: 0.615272435897436 key: train_roc_auc value: [0.77391304 0.75217391 0.79565217 0.78695652 0.76293103 0.75431034 0.79565217 0.73478261 0.73913043 0.73913043] mean value: 0.7634632683658171 key: test_jcc value: [0.625 0.57142857 0.52173913 0.5 0.5 0.64 0.52 0.7 0.52173913 0.5 ] mean value: 0.5599906832298136 key: train_jcc value: [0.72916667 0.7106599 0.7486631 0.74074074 0.71794872 0.7106599 0.75 0.6980198 0.70149254 0.70149254] mean value: 0.7208843900521782 MCC on Blind test: 0.14 Accuracy on Blind test: 0.59 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02379489 0.01429152 0.02167296 0.03507352 0.01420116 0.01453209 0.02825117 0.03548551 0.034168 0.03530765] mean value: 0.02567784786224365 key: score_time value: [0.01208448 0.01182032 0.02128649 0.01187086 0.01207566 0.01173615 0.02239275 0.02398872 0.02039075 0.02078676] mean value: 0.016843295097351073 key: test_mcc value: [0.36894943 0.45455066 0.51308782 0.6505161 0.48553038 0.33113309 0.21483446 0.73929609 0.51681139 0.4241768 ] mean value: 0.4698886211837859 key: train_mcc value: [0.8097547 0.80188999 0.78596805 0.80974419 0.78701889 0.81887846 0.81910397 0.79449635 0.82618954 0.77076024] mean value: 0.8023804385252125 key: test_accuracy value: [0.68965517 0.72413793 0.75862069 0.82758621 0.75 0.67857143 0.60714286 0.85714286 0.75 0.71428571] mean value: 0.7357142857142858 key: train_accuracy value: [0.90588235 0.90196078 0.89411765 0.90588235 0.89453125 0.91015625 0.91015625 0.8984375 0.9140625 0.88671875] mean value: 0.9021905637254902 key: test_fscore value: [0.72727273 0.73333333 0.8 0.84848485 0.78787879 0.74285714 0.62068966 0.88235294 0.74074074 0.75 ] mean value: 0.7633610176916465 key: train_fscore value: [0.91549296 0.91103203 0.90526316 0.91489362 0.90526316 0.91756272 0.91756272 0.90909091 0.92307692 0.8989547 ] mean value: 0.9118192903056238 key: test_precision value: [0.70588235 0.78571429 0.73684211 0.82352941 0.76470588 0.68421053 0.64285714 0.78947368 0.83333333 0.70588235] mean value: 0.7472431077694236 key: train_precision value: [0.90277778 0.90780142 0.88965517 0.9084507 0.88965517 0.92086331 0.92753623 0.89655172 0.91034483 0.88356164] mean value: 0.9037197982066763 key: test_recall value: [0.75 0.6875 0.875 0.875 0.8125 0.8125 0.6 1. 0.66666667 0.8 ] mean value: 0.7879166666666667 key: train_recall value: [0.92857143 0.91428571 0.92142857 0.92142857 0.92142857 0.91428571 0.90780142 0.92198582 0.93617021 0.91489362] mean value: 0.9202279635258358 key: test_roc_auc value: [0.68269231 0.72836538 0.74519231 0.82211538 0.73958333 0.65625 0.60769231 0.84615385 0.75641026 0.70769231] mean value: 0.7292147435897436 key: train_roc_auc value: [0.90341615 0.90062112 0.89114907 0.90419255 0.89174877 0.90972906 0.91042245 0.89577552 0.91156337 0.88353377] mean value: 0.9002151811632177 key: test_jcc value: [0.57142857 0.57894737 0.66666667 0.73684211 0.65 0.59090909 0.45 0.78947368 0.58823529 0.6 ] mean value: 0.6222502781016713 key: train_jcc value: [0.84415584 0.83660131 0.82692308 0.84313725 0.82692308 0.84768212 0.84768212 0.83333333 0.85714286 0.8164557 ] mean value: 0.8380036685182819 MCC on Blind test: 0.31 Accuracy on Blind test: 0.66 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.13301873 0.25432467 0.27289152 0.35769796 0.24618363 0.25176811 0.28279519 0.18110466 0.1442101 0.2438333 ] mean value: 0.23678278923034668 key: score_time value: [0.01179385 0.02128005 0.02014399 0.0181601 0.02356052 0.02306724 0.01739192 0.02378941 0.02485228 0.02287745] mean value: 0.020691680908203124 key: test_mcc value: [0.43855669 0.51675233 0.51308782 0.5943331 0.57054433 0.25819889 0.21483446 0.73929609 0.64450339 0.4241768 ] mean value: 0.4914283896819048 key: train_mcc value: [0.69116162 0.68292978 0.70648469 0.65069859 0.67620784 0.70049261 0.81910397 0.79449635 0.68431304 0.77076024] mean value: 0.7176648721292616 key: test_accuracy value: [0.72413793 0.75862069 0.75862069 0.79310345 0.78571429 0.64285714 0.60714286 0.85714286 0.82142857 0.71428571] mean value: 0.7463054187192119 key: train_accuracy value: [0.84705882 0.84313725 0.85490196 0.82745098 0.83984375 0.8515625 0.91015625 0.8984375 0.84375 0.88671875] mean value: 0.8603017769607844 key: test_fscore value: [0.76470588 0.77419355 0.8 0.83333333 0.83333333 0.70588235 0.62068966 0.88235294 0.82758621 0.75 ] mean value: 0.7792077253593317 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:115: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:118: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.86597938 0.86206897 0.87108014 0.84722222 0.85714286 0.86428571 0.91756272 0.90909091 0.86394558 0.8989547 ] mean value: 0.8757333195153447 key: test_precision value: [0.72222222 0.8 0.73684211 0.75 0.75 0.66666667 0.64285714 0.78947368 0.85714286 0.70588235] mean value: 0.7421087031303749 key: train_precision value: [0.83443709 0.83333333 0.85034014 0.82432432 0.83673469 0.86428571 0.92753623 0.89655172 0.83006536 0.88356164] mean value: 0.858117024730279 key: test_recall value: [0.8125 0.75 0.875 0.9375 0.9375 0.75 0.6 1. 0.8 0.8 ] mean value: 0.82625 key: train_recall value: [0.9 0.89285714 0.89285714 0.87142857 0.87857143 0.86428571 0.90780142 0.92198582 0.90070922 0.91489362] mean value: 0.8945390070921986 key: test_roc_auc value: [0.71394231 0.75961538 0.74519231 0.77644231 0.76041667 0.625 0.60769231 0.84615385 0.82307692 0.70769231] mean value: 0.7365224358974359 key: train_roc_auc value: [0.84130435 0.83773292 0.8507764 0.82267081 0.83583744 0.85024631 0.91042245 0.89577552 0.83731113 0.88353377] mean value: 0.8565611077440003 key: test_jcc value: [0.61904762 0.63157895 0.66666667 0.71428571 0.71428571 0.54545455 0.45 0.78947368 0.70588235 0.6 ] mean value: 0.6436675244260384 key: train_jcc value: [0.76363636 0.75757576 0.77160494 0.73493976 0.75 0.76100629 0.84768212 0.83333333 0.76047904 0.8164557 ] mean value: 0.7796713298485378 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03288984 0.03522205 0.03412771 0.03215051 0.03374076 0.03229523 0.03486323 0.03421307 0.0346148 0.03472567] mean value: 0.033884286880493164 key: score_time value: [0.01216006 0.01787066 0.01389241 0.01186514 0.01227212 0.01199508 0.01420188 0.01425934 0.0143292 0.01416445] mean value: 0.013701033592224122 key: test_mcc value: [0.31814238 0.37796447 0.48333333 0.68826048 0.6778302 0.48333333 0.61608311 0.69203857 0.4184137 0.4184137 ] mean value: 0.5173813291351835 key: train_mcc value: [0.75722013 0.72864578 0.74385734 0.70836501 0.71529889 0.73015914 0.71535695 0.70820669 0.74383139 0.73679947] mean value: 0.7287740786080976 key: test_accuracy value: [0.65625 0.6875 0.74193548 0.83870968 0.83870968 0.74193548 0.80645161 0.83870968 0.70967742 0.70967742] mean value: 0.7569556451612903 key: train_accuracy value: [0.87857143 0.86428571 0.87188612 0.85409253 0.85765125 0.86476868 0.85765125 0.85409253 0.87188612 0.8683274 ] mean value: 0.8643213014743264 key: test_fscore value: [0.68571429 0.70588235 0.75 0.85714286 0.84848485 0.75 0.78571429 0.84848485 0.68965517 0.68965517] mean value: 0.7610733823309888 key: train_fscore value: [0.87943262 0.86524823 0.87234043 0.85512367 0.85714286 0.86131387 0.85915493 0.85409253 0.87323944 0.87017544] mean value: 0.8647264008747467 key: test_precision value: [0.63157895 0.66666667 0.75 0.78947368 0.82352941 0.75 0.84615385 0.77777778 0.71428571 0.71428571] mean value: 0.7463751762513372 key: train_precision value: [0.87323944 0.85915493 0.86619718 0.84615385 0.85714286 0.88059701 0.85314685 0.85714286 0.86713287 0.86111111] mean value: 0.8621018956051539 key: test_recall value: [0.75 0.75 0.75 0.9375 0.875 0.75 0.73333333 0.93333333 0.66666667 0.66666667] mean value: 0.78125 key: train_recall value: [0.88571429 0.87142857 0.87857143 0.86428571 0.85714286 0.84285714 0.86524823 0.85106383 0.87943262 0.87943262] mean value: 0.8675177304964539 key: test_roc_auc value: [0.65625 0.6875 0.74166667 0.83541667 0.8375 0.74166667 0.80416667 0.84166667 0.70833333 0.70833333] mean value: 0.75625 key: train_roc_auc value: [0.87857143 0.86428571 0.87190983 0.85412867 0.85764944 0.86469098 0.85762411 0.85410334 0.87185917 0.86828774] mean value: 0.8643110435663628 key: test_jcc value: [0.52173913 0.54545455 0.6 0.75 0.73684211 0.6 0.64705882 0.73684211 0.52631579 0.52631579] mean value: 0.6190568288892424 key: train_jcc value: [0.78481013 0.7625 0.77358491 0.74691358 0.75 0.75641026 0.75308642 0.74534161 0.775 0.77018634] mean value: 0.7617833238963472 MCC on Blind test: 0.38 Accuracy on Blind test: 0.69 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.77032852 0.97276163 0.80582404 0.79673815 0.8783679 0.79550815 0.79311538 0.96027112 0.78945112 0.9483645 ] mean value: 0.8510730504989624 key: score_time value: [0.01441407 0.01222777 0.01214099 0.01213002 0.01198173 0.01428342 0.01189327 0.01310682 0.01192641 0.01187658] mean value: 0.012598109245300294 key: test_mcc value: [0.31311215 0.25 0.54812195 0.68826048 0.6778302 0.37616262 0.61608311 0.69203857 0.48333333 0.48527095] mean value: 0.5130213359522333 key: train_mcc value: [0.91465912 0.64450339 0.68018255 0.6728996 0.99290744 1. 0.64434879 0.6442069 0.66585571 0.69434748] mean value: 0.755391096954088 key: test_accuracy value: [0.65625 0.625 0.77419355 0.83870968 0.83870968 0.67741935 0.80645161 0.83870968 0.74193548 0.74193548] mean value: 0.7539314516129032 key: train_accuracy value: [0.95714286 0.82142857 0.83985765 0.83629893 0.99644128 1. 0.82206406 0.82206406 0.83274021 0.84697509] mean value: 0.8775012709710218 key: test_fscore value: [0.66666667 0.625 0.78787879 0.85714286 0.84848485 0.73684211 0.78571429 0.84848485 0.73333333 0.71428571] mean value: 0.76038334472545 key: train_fscore value: [0.95774648 0.82758621 0.84210526 0.83802817 0.99641577 1. 0.82517483 0.82142857 0.83623693 0.85017422] mean value: 0.8794896434980269 key: test_precision value: [0.64705882 0.625 0.76470588 0.78947368 0.82352941 0.63636364 0.84615385 0.77777778 0.73333333 0.76923077] mean value: 0.7412627164716948 key: train_precision value: [0.94444444 0.8 0.82758621 0.82638889 1. 1. 0.8137931 0.82733813 0.82191781 0.83561644] mean value: 0.8697085019749906 key: test_recall value: [0.6875 0.625 0.8125 0.9375 0.875 0.875 0.73333333 0.93333333 0.73333333 0.66666667] mean value: 0.7879166666666666 key: train_recall value: [0.97142857 0.85714286 0.85714286 0.85 0.99285714 1. 0.83687943 0.81560284 0.85106383 0.86524823] mean value: 0.8897365754812563 key: test_roc_auc value: [0.65625 0.625 0.77291667 0.83541667 0.8375 0.67083333 0.80416667 0.84166667 0.74166667 0.73958333] mean value: 0.7525 key: train_roc_auc value: [0.95714286 0.82142857 0.83991895 0.83634752 0.99642857 1. 0.82201114 0.82208713 0.83267477 0.84690983] mean value: 0.8774949341438704 key: test_jcc value: [0.5 0.45454545 0.65 0.75 0.73684211 0.58333333 0.64705882 0.73684211 0.57894737 0.55555556] mean value: 0.6193124745911124 key: train_jcc value: [0.91891892 0.70588235 0.72727273 0.72121212 0.99285714 1. 0.70238095 0.6969697 0.71856287 0.73939394] mean value: 0.7923450726198172 MCC on Blind test: 0.42 Accuracy on Blind test: 0.71 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01640129 0.01214981 0.01048613 0.01034427 0.01015806 0.01013517 0.01000237 0.01033115 0.01018715 0.01006889] mean value: 0.011026430130004882 key: score_time value: [0.01300478 0.00980377 0.00967383 0.00959897 0.00941086 0.00940037 0.009377 0.00936294 0.00931811 0.00939512] mean value: 0.009834575653076171 key: test_mcc value: [0.46056619 0.06579517 0.49612132 0.37616262 0.55777335 0.61608311 0.54812195 0.58316015 0.48954403 0.37191715] mean value: 0.45652450481312273 key: train_mcc value: [0.49419142 0.48038446 0.53031544 0.47188822 0.5319049 0.53856487 0.53744808 0.52073939 0.49113636 0.54241467] mean value: 0.5138987824814698 key: test_accuracy value: [0.71875 0.53125 0.74193548 0.67741935 0.74193548 0.80645161 0.77419355 0.77419355 0.74193548 0.67741935] mean value: 0.7185483870967742 key: train_accuracy value: [0.73571429 0.725 0.75800712 0.72953737 0.76156584 0.76156584 0.76868327 0.75088968 0.7366548 0.76512456] mean value: 0.7492742755465175 key: test_fscore value: [0.75675676 0.59459459 0.77777778 0.73684211 0.8 0.82352941 0.75862069 0.8 0.75 0.70588235] mean value: 0.7504003688753342 key: train_fscore value: [0.77018634 0.76595745 0.78205128 0.75641026 0.78032787 0.78594249 0.77192982 0.78125 0.76875 0.78846154] mean value: 0.7751267044561957 key: test_precision value: [0.66666667 0.52380952 0.7 0.63636364 0.66666667 0.77777778 0.78571429 0.7 0.70588235 0.63157895] mean value: 0.6794459857308155 key: train_precision value: [0.68131868 0.66666667 0.70930233 0.68604651 0.72121212 0.71098266 0.76388889 0.69832402 0.68715084 0.71929825] mean value: 0.7044190960204428 key: test_recall value: [0.875 0.6875 0.875 0.875 1. 0.875 0.73333333 0.93333333 0.8 0.8 ] mean value: 0.8454166666666667 key: train_recall value: [0.88571429 0.9 0.87142857 0.84285714 0.85 0.87857143 0.78014184 0.88652482 0.87234043 0.87234043] mean value: 0.8639918946301925 key: test_roc_auc value: [0.71875 0.53125 0.7375 0.67083333 0.73333333 0.80416667 0.77291667 0.77916667 0.74375 0.68125 ] mean value: 0.7172916666666667 key: train_roc_auc value: [0.73571429 0.725 0.75840932 0.72993921 0.76187943 0.76198075 0.76864235 0.75040527 0.73617021 0.76474164] mean value: 0.7492882472137791 key: test_jcc value: [0.60869565 0.42307692 0.63636364 0.58333333 0.66666667 0.7 0.61111111 0.66666667 0.6 0.54545455] mean value: 0.6041368534846796 key: train_jcc value: [0.62626263 0.62068966 0.64210526 0.60824742 0.63978495 0.64736842 0.62857143 0.64102564 0.62436548 0.65079365] mean value: 0.632921453718676 MCC on Blind test: 0.35 Accuracy on Blind test: 0.68 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01051235 0.01030493 0.01025033 0.01022387 0.01026034 0.01030755 0.01042366 0.01025248 0.01031613 0.01030803] mean value: 0.010315966606140137 key: score_time value: [0.00944209 0.00935793 0.0093224 0.00918984 0.00931573 0.00929213 0.00936365 0.0089705 0.00935435 0.00958991] mean value: 0.00931985378265381 key: test_mcc value: [0.19088543 0.438357 0.55 0.4365267 0.61608311 0.4184137 0.48527095 0.5612264 0.35983579 0.55 ] mean value: 0.46065990788811395 key: train_mcc value: [0.57858619 0.54486237 0.52431066 0.54736197 0.51744233 0.52431066 0.55262901 0.52460395 0.53764274 0.5252232 ] mean value: 0.5376973074762675 key: test_accuracy value: [0.59375 0.71875 0.77419355 0.70967742 0.80645161 0.70967742 0.74193548 0.77419355 0.67741935 0.77419355] mean value: 0.7280241935483871 key: train_accuracy value: [0.78928571 0.77142857 0.76156584 0.77224199 0.75800712 0.76156584 0.77580071 0.76156584 0.76868327 0.76156584] mean value: 0.7681710726995424 key: test_fscore value: [0.62857143 0.72727273 0.77419355 0.75675676 0.82352941 0.72727273 0.71428571 0.78787879 0.6875 0.77419355] mean value: 0.7401454650577042 key: train_fscore value: [0.79003559 0.78082192 0.76816609 0.78231293 0.76551724 0.76816609 0.78350515 0.77133106 0.77351916 0.77288136] mean value: 0.7756256583831929 key: test_precision value: [0.57894737 0.70588235 0.8 0.66666667 0.77777778 0.70588235 0.76923077 0.72222222 0.64705882 0.75 ] mean value: 0.7123668333730253 key: train_precision value: [0.78723404 0.75 0.74496644 0.74675325 0.74 0.74496644 0.76 0.74342105 0.76027397 0.74025974] mean value: 0.7517874940706537 key: test_recall value: [0.6875 0.75 0.75 0.875 0.875 0.75 0.66666667 0.86666667 0.73333333 0.8 ] mean value: 0.7754166666666666 key: train_recall value: [0.79285714 0.81428571 0.79285714 0.82142857 0.79285714 0.79285714 0.80851064 0.80141844 0.78723404 0.80851064] mean value: 0.8012816616008105 key: test_roc_auc value: [0.59375 0.71875 0.775 0.70416667 0.80416667 0.70833333 0.73958333 0.77708333 0.67916667 0.775 ] mean value: 0.7275 key: train_roc_auc value: [0.78928571 0.77142857 0.7616768 0.77241641 0.7581307 0.7616768 0.77568389 0.76142351 0.76861702 0.76139818] mean value: 0.7681737588652482 key: test_jcc value: [0.45833333 0.57142857 0.63157895 0.60869565 0.7 0.57142857 0.55555556 0.65 0.52380952 0.63157895] mean value: 0.5902409102466311 key: train_jcc value: [0.65294118 0.64044944 0.62359551 0.6424581 0.62011173 0.62359551 0.6440678 0.62777778 0.63068182 0.62983425] mean value: 0.6335513105024437 MCC on Blind test: 0.32 Accuracy on Blind test: 0.66 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00951195 0.0097332 0.0096612 0.00966334 0.00966907 0.00986385 0.00983644 0.01014018 0.0087738 0.00984406] mean value: 0.00966970920562744 key: score_time value: [0.01151824 0.01224685 0.01311278 0.01185679 0.01232314 0.01154304 0.01158881 0.012007 0.01123166 0.01146412] mean value: 0.011889243125915527 key: test_mcc value: [0.38729833 0.12598816 0.16878989 0.29166667 0.15899721 0.28870546 0.22630095 0.55 0.35445878 0.28870546] mean value: 0.2840910904397822 key: train_mcc value: [0.5872142 0.60790321 0.62988855 0.58726379 0.58726379 0.60170267 0.64819964 0.5803804 0.59462628 0.62347871] mean value: 0.6047921251531794 key: test_accuracy value: [0.6875 0.5625 0.58064516 0.64516129 0.58064516 0.64516129 0.61290323 0.77419355 0.67741935 0.64516129] mean value: 0.6411290322580645 key: train_accuracy value: [0.79285714 0.80357143 0.81494662 0.79359431 0.79359431 0.80071174 0.82206406 0.79003559 0.79715302 0.8113879 ] mean value: 0.8019916115912558 key: test_fscore value: [0.72222222 0.53333333 0.55172414 0.64516129 0.60606061 0.66666667 0.53846154 0.77419355 0.64285714 0.62068966] mean value: 0.6301370141414636 key: train_fscore value: [0.8 0.80836237 0.81428571 0.79432624 0.79432624 0.8028169 0.83221477 0.79442509 0.80139373 0.816609 ] mean value: 0.8058760044273121 key: test_precision value: [0.65 0.57142857 0.61538462 0.66666667 0.58823529 0.64705882 0.63636364 0.75 0.69230769 0.64285714] mean value: 0.6460302442655383 key: train_precision value: [0.77333333 0.78911565 0.81428571 0.78873239 0.78873239 0.79166667 0.78980892 0.78082192 0.78767123 0.7972973 ] mean value: 0.7901465514456293 key: test_recall value: [0.8125 0.5 0.5 0.625 0.625 0.6875 0.46666667 0.8 0.6 0.6 ] mean value: 0.6216666666666667 key: train_recall value: [0.82857143 0.82857143 0.81428571 0.8 0.8 0.81428571 0.87943262 0.80851064 0.81560284 0.83687943] mean value: 0.8226139817629179 key: test_roc_auc value: [0.6875 0.5625 0.58333333 0.64583333 0.57916667 0.64375 0.60833333 0.775 0.675 0.64375 ] mean value: 0.6404166666666667 key: train_roc_auc value: [0.79285714 0.80357143 0.81494428 0.79361702 0.79361702 0.80075988 0.82185917 0.7899696 0.79708713 0.81129686] mean value: 0.8019579533941237 key: test_jcc value: [0.56521739 0.36363636 0.38095238 0.47619048 0.43478261 0.5 0.36842105 0.63157895 0.47368421 0.45 ] mean value: 0.46444634313055366 key: train_jcc value: [0.66666667 0.67836257 0.68674699 0.65882353 0.65882353 0.67058824 0.71264368 0.65895954 0.66860465 0.69005848] mean value: 0.6750277868263664 MCC on Blind test: 0.14 Accuracy on Blind test: 0.57 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01648831 0.01415229 0.01382375 0.01463223 0.0138309 0.01456952 0.01394272 0.01430678 0.01460671 0.01483178] mean value: 0.014518499374389648 key: score_time value: [0.01045418 0.00998831 0.00985622 0.00994062 0.00994372 0.00987506 0.00997186 0.0103848 0.01011252 0.01010108] mean value: 0.010062837600708007 key: test_mcc value: [0.40451992 0.438357 0.48333333 0.68826048 0.6125 0.4184137 0.55 0.69203857 0.48333333 0.43041423] mean value: 0.5201170568656933 key: train_mcc value: [0.73799581 0.72864578 0.72953394 0.74385734 0.70292136 0.75103885 0.71640396 0.68010159 0.7083207 0.71704623] mean value: 0.7215865560453796 key: test_accuracy value: [0.6875 0.71875 0.74193548 0.83870968 0.80645161 0.70967742 0.77419355 0.83870968 0.74193548 0.70967742] mean value: 0.7567540322580645 key: train_accuracy value: [0.86785714 0.86428571 0.86476868 0.87188612 0.85053381 0.87544484 0.85765125 0.83985765 0.85409253 0.85765125] mean value: 0.8604028978139299 key: test_fscore value: [0.73684211 0.70967742 0.75 0.85714286 0.8125 0.72727273 0.77419355 0.84848485 0.73333333 0.72727273] mean value: 0.7676719566511587 key: train_fscore value: [0.87285223 0.86524823 0.86428571 0.87234043 0.85517241 0.87364621 0.86206897 0.84320557 0.85614035 0.8630137 ] mean value: 0.8627973813561808 key: test_precision value: [0.63636364 0.73333333 0.75 0.78947368 0.8125 0.70588235 0.75 0.77777778 0.73333333 0.66666667] mean value: 0.735533078462645 key: train_precision value: [0.8410596 0.85915493 0.86428571 0.86619718 0.82666667 0.88321168 0.83892617 0.82876712 0.84722222 0.83443709] mean value: 0.8489928381208813 key: test_recall value: [0.875 0.6875 0.75 0.9375 0.8125 0.75 0.8 0.93333333 0.73333333 0.8 ] mean value: 0.8079166666666666 key: train_recall value: [0.90714286 0.87142857 0.86428571 0.87857143 0.88571429 0.86428571 0.88652482 0.85815603 0.86524823 0.89361702] mean value: 0.8774974670719351 key: test_roc_auc value: [0.6875 0.71875 0.74166667 0.83541667 0.80625 0.70833333 0.775 0.84166667 0.74166667 0.7125 ] mean value: 0.7568750000000001 key: train_roc_auc value: [0.86785714 0.86428571 0.86476697 0.87190983 0.85065856 0.87540527 0.85754813 0.8397923 0.85405268 0.8575228 ] mean value: 0.8603799392097264 key: test_jcc value: [0.58333333 0.55 0.6 0.75 0.68421053 0.57142857 0.63157895 0.73684211 0.57894737 0.57142857] mean value: 0.6257769423558898 key: train_jcc value: [0.77439024 0.7625 0.76100629 0.77358491 0.74698795 0.77564103 0.75757576 0.72891566 0.74846626 0.75903614] mean value: 0.7588104238792632 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.1177218 1.43766284 1.54514217 1.15275574 1.26740909 1.14019227 1.27580643 1.12184954 1.25379801 1.11662126] mean value: 1.2428959131240844 key: score_time value: [0.01441884 0.01461029 0.02055955 0.01454592 0.01471639 0.01474786 0.01493645 0.01513934 0.0150795 0.0150938 ] mean value: 0.015384793281555176 key: test_mcc value: [0.31311215 0.44539933 0.54812195 0.6310315 0.6125 0.4184137 0.35416667 0.74896053 0.42321607 0.48954403] mean value: 0.4984465941438147 key: train_mcc value: [0.98581488 0.97142857 0.9929078 0.9929078 0.98586555 0.99290744 0.978869 0.97867167 0.98586412 0.978869 ] mean value: 0.9844105844205777 key: test_accuracy value: [0.65625 0.71875 0.77419355 0.80645161 0.80645161 0.70967742 0.67741935 0.87096774 0.70967742 0.74193548] mean value: 0.7471774193548387 key: train_accuracy value: [0.99285714 0.98571429 0.99644128 0.99644128 0.99288256 0.99644128 0.98932384 0.98932384 0.99288256 0.98932384] mean value: 0.9921631926792069 key: test_fscore value: [0.66666667 0.68965517 0.78787879 0.83333333 0.8125 0.72727273 0.66666667 0.875 0.66666667 0.75 ] mean value: 0.7475640020898642 key: train_fscore value: [0.9929078 0.98571429 0.99644128 0.99644128 0.9929078 0.99641577 0.98947368 0.98939929 0.99295775 0.98947368] mean value: 0.992213262962421 key: test_precision value: [0.64705882 0.76923077 0.76470588 0.75 0.8125 0.70588235 0.66666667 0.82352941 0.75 0.70588235] mean value: 0.7395456259426848 key: train_precision value: [0.98591549 0.98571429 0.9929078 0.9929078 0.98591549 1. 0.97916667 0.98591549 0.98601399 0.97916667] mean value: 0.9873623686771724 key: test_recall value: [0.6875 0.625 0.8125 0.9375 0.8125 0.75 0.66666667 0.93333333 0.6 0.8 ] mean value: 0.7625 key: train_recall value: [1. 0.98571429 1. 1. 1. 0.99285714 1. 0.9929078 1. 1. ] mean value: 0.9971479229989868 key: test_roc_auc value: [0.65625 0.71875 0.77291667 0.80208333 0.80625 0.70833333 0.67708333 0.87291667 0.70625 0.74375 ] mean value: 0.7464583333333333 key: train_roc_auc value: [0.99285714 0.98571429 0.9964539 0.9964539 0.9929078 0.99642857 0.98928571 0.98931104 0.99285714 0.98928571] mean value: 0.9921555217831813 key: test_jcc value: [0.5 0.52631579 0.65 0.71428571 0.68421053 0.57142857 0.5 0.77777778 0.5 0.6 ] mean value: 0.6024018379281537 key: train_jcc value: [0.98591549 0.97183099 0.9929078 0.9929078 0.98591549 0.99285714 0.97916667 0.97902098 0.98601399 0.97916667] mean value: 0.9845703015893307 MCC on Blind test: 0.35 Accuracy on Blind test: 0.68 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.03034997 0.01905537 0.02076721 0.0199306 0.02213931 0.0209415 0.02394128 0.02460599 0.02585983 0.0239253 ] mean value: 0.023151636123657227 key: score_time value: [0.01116729 0.00906777 0.00881743 0.00873852 0.00879312 0.00883675 0.00890684 0.00884533 0.00888515 0.00888705] mean value: 0.009094524383544921 key: test_mcc value: [0.438357 0.625 0.35416667 0.6778302 0.6778302 0.71269665 0.48333333 0.42083333 0.42083333 0.48954403] mean value: 0.5300424751832439 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.71875 0.8125 0.67741935 0.83870968 0.83870968 0.83870968 0.74193548 0.70967742 0.70967742 0.74193548] mean value: 0.7628024193548387 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.72727273 0.8125 0.6875 0.84848485 0.84848485 0.86486486 0.73333333 0.70967742 0.70967742 0.75 ] mean value: 0.7691795461150299 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.70588235 0.8125 0.6875 0.82352941 0.82352941 0.76190476 0.73333333 0.6875 0.6875 0.70588235] mean value: 0.742906162464986 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 0.8125 0.6875 0.875 0.875 1. 0.73333333 0.73333333 0.73333333 0.8 ] mean value: 0.8 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.71875 0.8125 0.67708333 0.8375 0.8375 0.83333333 0.74166667 0.71041667 0.71041667 0.74375 ] mean value: 0.7622916666666667 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.57142857 0.68421053 0.52380952 0.73684211 0.73684211 0.76190476 0.57894737 0.55 0.55 0.6 ] mean value: 0.6293984962406015 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.47 Accuracy on Blind test: 0.74 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.10765457 0.10504031 0.1088562 0.10588574 0.10681224 0.10710359 0.11069465 0.10770369 0.10900497 0.10556507] mean value: 0.10743210315704346 key: score_time value: [0.01778412 0.01823831 0.0179069 0.017802 0.01791954 0.0193615 0.0178206 0.01882005 0.0178678 0.01819921] mean value: 0.01817200183868408 key: test_mcc value: [0.19088543 0.50395263 0.55 0.6125 0.87866878 0.29069387 0.54812195 0.67916667 0.42083333 0.61608311] mean value: 0.5290905772900235 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.59375 0.75 0.77419355 0.80645161 0.93548387 0.64516129 0.77419355 0.83870968 0.70967742 0.80645161] mean value: 0.7634072580645161 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.62857143 0.73333333 0.77419355 0.8125 0.93333333 0.68571429 0.75862069 0.83870968 0.70967742 0.78571429] mean value: 0.7660368001483129 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.57894737 0.78571429 0.8 0.8125 1. 0.63157895 0.78571429 0.8125 0.6875 0.84615385] mean value: 0.7740608733371891 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.6875 0.6875 0.75 0.8125 0.875 0.75 0.73333333 0.86666667 0.73333333 0.73333333] mean value: 0.7629166666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.59375 0.75 0.775 0.80625 0.9375 0.64166667 0.77291667 0.83958333 0.71041667 0.80416667] mean value: 0.763125 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.45833333 0.57894737 0.63157895 0.68421053 0.875 0.52173913 0.61111111 0.72222222 0.55 0.64705882] mean value: 0.6280201462736125 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01097584 0.01011252 0.00978446 0.0105629 0.01060104 0.01052165 0.01064515 0.01009536 0.01080298 0.01068449] mean value: 0.010478639602661132 key: score_time value: [0.00919628 0.00954986 0.0095408 0.00925446 0.00952196 0.00908875 0.00936627 0.0090847 0.00936127 0.00946689] mean value: 0.009343123435974121 key: test_mcc value: [0.12909944 0.12598816 0.10687275 0.61608311 0.35416667 0.35416667 0.29844172 0.48954403 0.09139077 0.225 ] mean value: 0.27907533194958034 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.5625 0.5625 0.5483871 0.80645161 0.67741935 0.67741935 0.64516129 0.74193548 0.5483871 0.61290323] mean value: 0.6383064516129032 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.61111111 0.53333333 0.5 0.82352941 0.6875 0.6875 0.56 0.75 0.46153846 0.6 ] mean value: 0.6214512317747612 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.55 0.57142857 0.58333333 0.77777778 0.6875 0.6875 0.7 0.70588235 0.54545455 0.6 ] mean value: 0.6408876580935404 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.6875 0.5 0.4375 0.875 0.6875 0.6875 0.46666667 0.8 0.4 0.6 ] mean value: 0.6141666666666666 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.5625 0.5625 0.55208333 0.80416667 0.67708333 0.67708333 0.63958333 0.74375 0.54375 0.6125 ] mean value: 0.6375000000000001 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.44 0.36363636 0.33333333 0.7 0.52380952 0.52380952 0.38888889 0.6 0.3 0.42857143] mean value: 0.4602049062049062 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.16 Accuracy on Blind test: 0.58 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.49366856 1.48023534 1.56707311 1.49572015 1.5158124 1.51820087 1.50707102 1.48903179 1.50217819 1.47849536] mean value: 1.5047486782073975 key: score_time value: [0.09201431 0.09436655 0.09239435 0.0984962 0.09627295 0.09926844 0.0995295 0.09666634 0.09262013 0.09160447] mean value: 0.09532332420349121 key: test_mcc value: [0.44539933 0.51639778 0.6125 0.67916667 0.67916667 0.48527095 0.68826048 0.74896053 0.48527095 0.74166667] mean value: 0.6082060015726044 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.71875 0.75 0.80645161 0.83870968 0.83870968 0.74193548 0.83870968 0.87096774 0.74193548 0.87096774] mean value: 0.8017137096774194 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.74285714 0.71428571 0.8125 0.83870968 0.83870968 0.76470588 0.81481481 0.875 0.71428571 0.86666667] mean value: 0.7982535290101703 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.68421053 0.83333333 0.8125 0.86666667 0.86666667 0.72222222 0.91666667 0.82352941 0.76923077 0.86666667] mean value: 0.8161692929533487 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.8125 0.625 0.8125 0.8125 0.8125 0.8125 0.73333333 0.93333333 0.66666667 0.86666667] mean value: 0.78875 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.71875 0.75 0.80625 0.83958333 0.83958333 0.73958333 0.83541667 0.87291667 0.73958333 0.87083333] mean value: 0.80125 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.59090909 0.55555556 0.68421053 0.72222222 0.72222222 0.61904762 0.6875 0.77777778 0.55555556 0.76470588] mean value: 0.6679706451958773 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.47 Accuracy on Blind test: 0.74 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.87081504 0.92136621 1.01795077 0.89831853 0.90612841 0.91984773 0.87696695 0.91162896 0.8944664 0.97775722] mean value: 0.9195246219635009 key: score_time value: [0.25071716 0.24670219 0.24876046 0.23476171 0.26595569 0.19230556 0.12388849 0.24812293 0.25460219 0.21983552] mean value: 0.22856519222259522 key: test_mcc value: [0.56360186 0.51639778 0.6778302 0.6778302 0.6125 0.54812195 0.68826048 0.74896053 0.29166667 0.68826048] mean value: 0.6013430149607514 key: train_mcc value: [0.91437902 0.88571429 0.91467803 0.89344886 0.9219233 0.9219233 0.90044081 0.90749278 0.90044081 0.89325701] mean value: 0.9053698202560589 key: test_accuracy value: [0.78125 0.75 0.83870968 0.83870968 0.80645161 0.77419355 0.83870968 0.87096774 0.64516129 0.83870968] mean value: 0.7982862903225807 key: train_accuracy value: [0.95714286 0.94285714 0.95729537 0.94661922 0.96085409 0.96085409 0.95017794 0.95373665 0.95017794 0.94661922] mean value: 0.9526334519572954 key: test_fscore value: [0.78787879 0.71428571 0.84848485 0.84848485 0.8125 0.78787879 0.81481481 0.875 0.64516129 0.81481481] mean value: 0.7949303906965197 key: train_fscore value: [0.95744681 0.94285714 0.95683453 0.94699647 0.96113074 0.96113074 0.95070423 0.9540636 0.95070423 0.94699647] mean value: 0.9528864955647521 key: test_precision value: [0.76470588 0.83333333 0.82352941 0.82352941 0.8125 0.76470588 0.91666667 0.82352941 0.625 0.91666667] mean value: 0.8104166666666667 key: train_precision value:/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [0.95070423 0.94285714 0.96376812 0.93706294 0.95104895 0.95104895 0.94405594 0.95070423 0.94405594 0.94366197] mean value: 0.947896840860711 key: test_recall value: [0.8125 0.625 0.875 0.875 0.8125 0.8125 0.73333333 0.93333333 0.66666667 0.73333333] mean value: 0.7879166666666666 key: train_recall value: [0.96428571 0.94285714 0.95 0.95714286 0.97142857 0.97142857 0.95744681 0.95744681 0.95744681 0.95035461] mean value: 0.9579837892603851 key: test_roc_auc value: [0.78125 0.75 0.8375 0.8375 0.80625 0.77291667 0.83541667 0.87291667 0.64583333 0.83541667] mean value: 0.7975 key: train_roc_auc value: [0.95714286 0.94285714 0.9572695 0.94665653 0.96089159 0.96089159 0.95015198 0.9537234 0.95015198 0.94660588] mean value: 0.9526342451874367 key: test_jcc value: [0.65 0.55555556 0.73684211 0.73684211 0.68421053 0.65 0.6875 0.77777778 0.47619048 0.6875 ] mean value: 0.6642418546365915 key: train_jcc value: [0.91836735 0.89189189 0.91724138 0.89932886 0.92517007 0.92517007 0.90604027 0.91216216 0.90604027 0.89932886] mean value: 0.9100741171391153 MCC on Blind test: 0.51 Accuracy on Blind test: 0.76 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02584529 0.00938845 0.00952554 0.00988483 0.00979161 0.00954032 0.01143622 0.01072431 0.01068521 0.01063943] mean value: 0.01174612045288086 key: score_time value: [0.01011896 0.00875497 0.00878572 0.0086236 0.00927162 0.00859261 0.01007676 0.00962615 0.00960374 0.00962329] mean value: 0.00930774211883545 key: test_mcc value: [0.19088543 0.438357 0.55 0.4365267 0.61608311 0.4184137 0.48527095 0.5612264 0.35983579 0.55 ] mean value: 0.46065990788811395 key: train_mcc value: [0.57858619 0.54486237 0.52431066 0.54736197 0.51744233 0.52431066 0.55262901 0.52460395 0.53764274 0.5252232 ] mean value: 0.5376973074762675 key: test_accuracy value: [0.59375 0.71875 0.77419355 0.70967742 0.80645161 0.70967742 0.74193548 0.77419355 0.67741935 0.77419355] mean value: 0.7280241935483871 key: train_accuracy value: [0.78928571 0.77142857 0.76156584 0.77224199 0.75800712 0.76156584 0.77580071 0.76156584 0.76868327 0.76156584] mean value: 0.7681710726995424 key: test_fscore value: [0.62857143 0.72727273 0.77419355 0.75675676 0.82352941 0.72727273 0.71428571 0.78787879 0.6875 0.77419355] mean value: 0.7401454650577042 key: train_fscore value: [0.79003559 0.78082192 0.76816609 0.78231293 0.76551724 0.76816609 0.78350515 0.77133106 0.77351916 0.77288136] mean value: 0.7756256583831929 key: test_precision value: [0.57894737 0.70588235 0.8 0.66666667 0.77777778 0.70588235 0.76923077 0.72222222 0.64705882 0.75 ] mean value: 0.7123668333730253 key: train_precision value: [0.78723404 0.75 0.74496644 0.74675325 0.74 0.74496644 0.76 0.74342105 0.76027397 0.74025974] mean value: 0.7517874940706537 key: test_recall value: [0.6875 0.75 0.75 0.875 0.875 0.75 0.66666667 0.86666667 0.73333333 0.8 ] mean value: 0.7754166666666666 key: train_recall value: [0.79285714 0.81428571 0.79285714 0.82142857 0.79285714 0.79285714 0.80851064 0.80141844 0.78723404 0.80851064] mean value: 0.8012816616008105 key: test_roc_auc value: [0.59375 0.71875 0.775 0.70416667 0.80416667 0.70833333 0.73958333 0.77708333 0.67916667 0.775 ] mean value: 0.7275 key: train_roc_auc value: [0.78928571 0.77142857 0.7616768 0.77241641 0.7581307 0.7616768 0.77568389 0.76142351 0.76861702 0.76139818] mean value: 0.7681737588652482 key: test_jcc value: [0.45833333 0.57142857 0.63157895 0.60869565 0.7 0.57142857 0.55555556 0.65 0.52380952 0.63157895] mean value: 0.5902409102466311 key: train_jcc value: [0.65294118 0.64044944 0.62359551 0.6424581 0.62011173 0.62359551 0.6440678 0.62777778 0.63068182 0.62983425] mean value: 0.6335513105024437 MCC on Blind test: 0.32 Accuracy on Blind test: 0.66 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.18638062 0.06737065 0.07680464 0.07340765 0.0783298 0.07052732 0.07229161 0.07644987 0.07185864 0.08797336] mean value: 0.08613941669464112 key: score_time value: [0.01080513 0.01068473 0.01079202 0.01073265 0.01104116 0.01060367 0.01058578 0.01099658 0.01063633 0.0119226 ] mean value: 0.010880064964294434 key: test_mcc value: [0.50395263 0.82717019 0.55 0.80833333 0.6125 0.74166667 0.6125 0.6778302 0.48333333 0.74689528] mean value: 0.6564181640197154 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.75 0.90625 0.77419355 0.90322581 0.80645161 0.87096774 0.80645161 0.83870968 0.74193548 0.87096774] mean value: 0.8269153225806452 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.73333333 0.89655172 0.77419355 0.90322581 0.8125 0.875 0.8 0.82758621 0.73333333 0.85714286] mean value: 0.8212866809682716 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.78571429 1. 0.8 0.93333333 0.8125 0.875 0.8 0.85714286 0.73333333 0.92307692] mean value: 0.8520100732600733 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.6875 0.8125 0.75 0.875 0.8125 0.875 0.8 0.8 0.73333333 0.8 ] mean value: 0.7945833333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.75 0.90625 0.775 0.90416667 0.80625 0.87083333 0.80625 0.8375 0.74166667 0.86875 ] mean value: 0.8266666666666667 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.57894737 0.8125 0.63157895 0.82352941 0.68421053 0.77777778 0.66666667 0.70588235 0.57894737 0.75 ] mean value: 0.7010040419676643 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.56 Accuracy on Blind test: 0.78 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04203153 0.05769587 0.06261158 0.06438947 0.03616428 0.06777334 0.03340292 0.05865383 0.03727388 0.05256343] mean value: 0.05125601291656494 key: score_time value: [0.01514578 0.02427006 0.0214963 0.01218319 0.01978111 0.01229453 0.02300239 0.01197886 0.01196885 0.02074003] mean value: 0.017286109924316406 key: test_mcc value: [0.38729833 0.31814238 0.54812195 0.48333333 0.61608311 0.49612132 0.55573827 0.6125 0.55573827 0.23939495] mean value: 0.48124719335387184 key: train_mcc value: [0.87877321 0.87877321 0.90044081 0.85801159 0.87197933 0.8647923 0.87954398 0.83632219 0.87902736 0.85798288] mean value: 0.870564685545068 key: test_accuracy value: [0.6875 0.65625 0.77419355 0.74193548 0.80645161 0.74193548 0.77419355 0.80645161 0.77419355 0.61290323] mean value: 0.7376008064516129 key: train_accuracy value: [0.93928571 0.93928571 0.95017794 0.92882562 0.93594306 0.93238434 0.93950178 0.91814947 0.93950178 0.92882562] mean value: 0.9351881037112354 key: test_fscore value: [0.72222222 0.62068966 0.78787879 0.75 0.82352941 0.77777778 0.74074074 0.8 0.74074074 0.64705882] mean value: 0.7410638159826801 key: train_fscore value: [0.93992933 0.93862816 0.94964029 0.92957746 0.93617021 0.93238434 0.94076655 0.91814947 0.93950178 0.93006993] mean value: 0.9354817520572338 key: test_precision value: [0.65 0.69230769 0.76470588 0.75 0.77777778 0.7 0.83333333 0.8 0.83333333 0.57894737] mean value: 0.7380405387526131 key: train_precision value: [0.93006993 0.94890511 0.95652174 0.91666667 0.92957746 0.92907801 0.92465753 0.92142857 0.94285714 0.91724138] mean value: 0.9317003552171846 key: test_recall value: [0.8125 0.5625 0.8125 0.75 0.875 0.875 0.66666667 0.8 0.66666667 0.73333333] mean value: 0.7554166666666666 key: train_recall value: [0.95 0.92857143 0.94285714 0.94285714 0.94285714 0.93571429 0.95744681 0.91489362 0.93617021 0.94326241] mean value: 0.9394630192502533 key: test_roc_auc value: [0.6875 0.65625 0.77291667 0.74166667 0.80416667 0.7375 0.77083333 0.80625 0.77083333 0.61666667] mean value: 0.7364583333333333 key: train_roc_auc value: [0.93928571 0.93928571 0.95015198 0.92887538 0.93596758 0.93239615 0.93943769 0.91816109 0.93951368 0.92877406] mean value: 0.9351849037487335 key: test_jcc value: [0.56521739 0.45 0.65 0.6 0.7 0.63636364 0.58823529 0.66666667 0.58823529 0.47826087] mean value: 0.5922979152135163 key: train_jcc value: [0.88666667 0.88435374 0.90410959 0.86842105 0.88 0.87333333 0.88815789 0.84868421 0.88590604 0.86928105] mean value: 0.8788913574452522 MCC on Blind test: 0.36 Accuracy on Blind test: 0.68 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01258373 0.0108614 0.00933671 0.0089016 0.00908589 0.00921488 0.00913262 0.00914311 0.00911927 0.00905776] mean value: 0.00964369773864746 key: score_time value: [0.01155806 0.00890446 0.00873089 0.00843573 0.00856686 0.00853968 0.00856066 0.00847149 0.0085392 0.0085187 ] mean value: 0.008882570266723632 key: test_mcc value: [ 0.38729833 -0.06262243 0.48333333 0.29069387 0.57461167 0.35416667 0.76594169 0.58316015 0.67916667 0.4184137 ] mean value: 0.4474163646749253 key: train_mcc value: [0.5161854 0.50871556 0.50376414 0.46836906 0.48948125 0.56024518 0.51919225 0.52460395 0.47487913 0.56002251] mean value: 0.5125458434467222 key: test_accuracy value: [0.6875 0.46875 0.74193548 0.64516129 0.77419355 0.67741935 0.87096774 0.77419355 0.83870968 0.70967742] mean value: 0.7188508064516129 key: train_accuracy value: [0.75714286 0.75357143 0.75088968 0.73309609 0.74377224 0.77935943 0.75800712 0.76156584 0.7366548 0.77935943] mean value: 0.7553418912048805 key: test_fscore value: [0.72222222 0.4516129 0.75 0.68571429 0.81081081 0.6875 0.84615385 0.8 0.83870968 0.68965517] mean value: 0.728237891796012 key: train_fscore value: [0.76712329 0.7628866 0.76027397 0.7440273 0.75342466 0.7862069 0.77181208 0.77133106 0.74829932 0.78767123] mean value: 0.7653056407214348 key: test_precision value: [0.65 0.46666667 0.75 0.63157895 0.71428571 0.6875 1. 0.7 0.8125 0.71428571] mean value: 0.7126817042606516 key: train_precision value: [0.73684211 0.73509934 0.73026316 0.7124183 0.72368421 0.76 0.73248408 0.74342105 0.71895425 0.7615894 ] mean value: 0.7354755893490372 key: test_recall value: [0.8125 0.4375 0.75 0.75 0.9375 0.6875 0.73333333 0.93333333 0.86666667 0.66666667] mean value: 0.7575 key: train_recall value: [0.8 0.79285714 0.79285714 0.77857143 0.78571429 0.81428571 0.81560284 0.80141844 0.78014184 0.81560284] mean value: 0.7977051671732522 key: test_roc_auc value: [0.6875 0.46875 0.74166667 0.64166667 0.76875 0.67708333 0.86666667 0.77916667 0.83958333 0.70833333] mean value: 0.7179166666666666 key: train_roc_auc value: [0.75714286 0.75357143 0.7510385 0.73325735 0.74392097 0.77948328 0.75780142 0.76142351 0.73649949 0.77922999] mean value: 0.7553368794326241 key: test_jcc value: [0.56521739 0.29166667 0.6 0.52173913 0.68181818 0.52380952 0.73333333 0.66666667 0.72222222 0.52631579] mean value: 0.5832788905729409 key: train_jcc value: [0.62222222 0.61666667 0.61325967 0.5923913 0.6043956 0.64772727 0.6284153 0.62777778 0.59782609 0.64971751] mean value: 0.620039941827292 MCC on Blind test: 0.33 Accuracy on Blind test: 0.67 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01375198 0.01595616 0.01475716 0.01581788 0.01899552 0.01607966 0.01525974 0.01696014 0.01642799 0.01958275] mean value: 0.01635890007019043 key: score_time value: [0.00951004 0.01106167 0.01103091 0.01145887 0.01153207 0.01150727 0.01147628 0.0115335 0.01149893 0.01153302] mean value: 0.011214256286621094 key: test_mcc value: [0.56360186 0.37796447 0.6778302 0.6681531 0.61925228 0.55 0.49612132 0.71807033 0.4184137 0.37191715] mean value: 0.5461324430638016 key: train_mcc value: [0.66793226 0.78643686 0.73077387 0.58944915 0.79807813 0.75269046 0.64551514 0.70167408 0.75139391 0.7486327 ] mean value: 0.7172576566417463 key: test_accuracy value: [0.78125 0.6875 0.83870968 0.80645161 0.80645161 0.77419355 0.74193548 0.83870968 0.70967742 0.67741935] mean value: 0.7662298387096774 key: train_accuracy value: [0.82142857 0.89285714 0.86476868 0.76156584 0.89679715 0.8683274 0.80782918 0.84697509 0.87544484 0.86120996] mean value: 0.8497203863751907 key: test_fscore value: [0.77419355 0.66666667 0.84848485 0.76923077 0.8 0.77419355 0.69230769 0.85714286 0.68965517 0.70588235] mean value: 0.7577757455961998 key: train_fscore value: [0.79338843 0.89051095 0.86805556 0.68837209 0.89056604 0.85258964 0.775 0.85808581 0.87364621 0.87774295] mean value: 0.8367957671081703 key: test_precision value: [0.8 0.71428571 0.82352941 1. 0.85714286 0.8 0.81818182 0.75 0.71428571 0.63157895] mean value: 0.7909004463029231 key: train_precision value: [0.94117647 0.91044776 0.84459459 0.98666667 0.944 0.96396396 0.93939394 0.80246914 0.88970588 0.78651685] mean value: 0.9008935268489424 key: test_recall value: [0.75 0.625 0.875 0.625 0.75 0.75 0.6 1. 0.66666667 0.8 ] mean value: 0.7441666666666666 key: train_recall value: [0.68571429 0.87142857 0.89285714 0.52857143 0.84285714 0.76428571 0.65957447 0.92198582 0.85815603 0.9929078 ] mean value: 0.8018338399189463 key: test_roc_auc value: [0.78125 0.6875 0.8375 0.8125 0.80833333 0.775 0.7375 0.84375 0.70833333 0.68125 ] mean value: 0.7672916666666667 key: train_roc_auc value: [0.82142857 0.89285714 0.86486829 0.76073961 0.89660588 0.86795846 0.80835866 0.84670719 0.87550659 0.86073961] mean value: 0.8495770010131712 key: test_jcc value: [0.63157895 0.5 0.73684211 0.625 0.66666667 0.63157895 0.52941176 0.75 0.52631579 0.54545455] mean value: 0.6142848766300778 key: train_jcc value: [0.65753425 0.80263158 0.76687117 0.5248227 0.80272109 0.74305556 0.63265306 0.75144509 0.77564103 0.78212291] mean value: 0.7239498408791925 MCC on Blind test: 0.43 Accuracy on Blind test: 0.69 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02493834 0.01682949 0.01681733 0.01732183 0.01644254 0.01673317 0.01562119 0.0168426 0.01635194 0.01678419] mean value: 0.01746826171875 key: score_time value: [0.01164889 0.0114696 0.01156235 0.01149487 0.01146197 0.01149297 0.01148677 0.01147938 0.01151204 0.01378536] mean value: 0.011739420890808105 key: test_mcc value: [0.25819889 0.31311215 0.37191715 0.60910959 0.6778302 0.4184137 0.55777335 0.57104024 0.24910095 0.37191715] mean value: 0.43984133753776633 key: train_mcc value: [0.68877552 0.80732823 0.70217171 0.64971048 0.76419699 0.70901046 0.62520923 0.56207051 0.42657096 0.73440191] mean value: 0.6669446002434639 key: test_accuracy value: [0.625 0.65625 0.67741935 0.77419355 0.83870968 0.70967742 0.74193548 0.74193548 0.58064516 0.67741935] mean value: 0.7023185483870967 key: train_accuracy value: [0.83214286 0.90357143 0.83274021 0.80427046 0.87900356 0.85409253 0.78647687 0.74733096 0.65480427 0.85765125] mean value: 0.8152084392475851 key: test_fscore value: [0.57142857 0.66666667 0.64285714 0.82051282 0.84848485 0.72727273 0.63636364 0.78947368 0.68292683 0.70588235] mean value: 0.7091869280006409 key: train_fscore value: [0.80658436 0.90459364 0.8 0.83282675 0.87022901 0.84981685 0.73451327 0.7965616 0.74406332 0.87261146] mean value: 0.8211800275313914 key: test_precision value: [0.66666667 0.64705882 0.75 0.69565217 0.82352941 0.70588235 1. 0.65217391 0.53846154 0.63157895] mean value: 0.7111003827688442 key: train_precision value: [0.95145631 0.8951049 0.98947368 0.72486772 0.93442623 0.87218045 0.97647059 0.66826923 0.59243697 0.79190751] mean value: 0.8396593603744082 key: test_recall value: [0.5 0.6875 0.5625 1. 0.875 0.75 0.46666667 1. 0.93333333 0.8 ] mean value: 0.7575000000000001 key: train_recall value: [0.7 0.91428571 0.67142857 0.97857143 0.81428571 0.82857143 0.58865248 0.9858156 1. 0.97163121] mean value: 0.8453242147922999 key: test_roc_auc value: [0.625 0.65625 0.68125 0.76666667 0.8375 0.70833333 0.73333333 0.75 0.59166667 0.68125 ] mean value: 0.703125 key: train_roc_auc value: [0.83214286 0.90357143 0.83216819 0.80488855 0.87877406 0.85400203 0.78718338 0.74647923 0.65357143 0.85724417] mean value: 0.8150025329280648 key: test_jcc value: [0.4 0.5 0.47368421 0.69565217 0.73684211 0.57142857 0.46666667 0.65217391 0.51851852 0.54545455] mean value: 0.5560420704814297 key: train_jcc value: [0.67586207 0.82580645 0.66666667 0.71354167 0.77027027 0.7388535 0.58041958 0.66190476 0.59243697 0.7740113 ] mean value: 0.6999773243916024 MCC on Blind test: 0.38 Accuracy on Blind test: 0.66 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.16071701 0.13766336 0.13898635 0.13800216 0.13920808 0.13892508 0.1380024 0.13671398 0.1370995 0.13823891] mean value: 0.14035568237304688 key: score_time value: [0.01500583 0.01504397 0.0150857 0.01502109 0.015064 0.01523542 0.01508236 0.01495957 0.0150423 0.01509142] mean value: 0.015063166618347168 key: test_mcc value: [0.56360186 0.44539933 0.61608311 0.67916667 0.55 0.74166667 0.54812195 0.54812195 0.48333333 0.61608311] mean value: 0.5791577998148137 key: train_mcc value: [0.99288247 0.97859639 0.9929078 0.97867167 0.9929078 0.9929078 1. 1. 0.99290744 1. ] mean value: 0.9921781379034591 key: test_accuracy value: [0.78125 0.71875 0.80645161 0.83870968 0.77419355 0.87096774 0.77419355 0.77419355 0.74193548 0.80645161] mean value: 0.7887096774193548 key: train_accuracy value: [0.99642857 0.98928571 0.99644128 0.98932384 0.99644128 0.99644128 1. 1. 0.99644128 1. ] mean value: 0.9960803253685816 key: test_fscore value: [0.78787879 0.68965517 0.82352941 0.83870968 0.77419355 0.875 0.75862069 0.75862069 0.73333333 0.78571429] mean value: 0.7825255596221703 key: train_fscore value: [0.99641577 0.98924731 0.99644128 0.98924731 0.99644128 0.99644128 1. 1. 0.99646643 1. ] mean value: 0.9960700668777009 key: test_precision value: [0.76470588 0.76923077 0.77777778 0.86666667 0.8 0.875 0.78571429 0.78571429 0.73333333 0.84615385] mean value: 0.8004296846943906 key: train_precision value: [1. 0.99280576 0.9929078 0.99280576 0.9929078 0.9929078 1. 1. 0.99295775 1. ] mean value: 0.995729266152556 key: test_recall value: [0.8125 0.625 0.875 0.8125 0.75 0.875 0.73333333 0.73333333 0.73333333 0.73333333] mean value: 0.7683333333333333 key: train_recall value: [0.99285714 0.98571429 1. 0.98571429 1. 1. 1. 1. 1. 1. ] mean value: 0.9964285714285714 key: test_roc_auc value: [0.78125 0.71875 0.80416667 0.83958333 0.775 0.87083333 0.77291667 0.77291667 0.74166667 0.80416667] mean value: 0.788125 key: train_roc_auc value: [0.99642857 0.98928571 0.9964539 0.98931104 0.9964539 0.9964539 1. 1. 0.99642857 1. ] mean value: 0.9960815602836879 key: test_jcc value: [0.65 0.52631579 0.7 0.72222222 0.63157895 0.77777778 0.61111111 0.61111111 0.57894737 0.64705882] mean value: 0.6456123151014792 key: train_jcc value: [0.99285714 0.9787234 0.9929078 0.9787234 0.9929078 0.9929078 1. 1. 0.99295775 1. ] mean value: 0.9921985102101973 MCC on Blind test: 0.51 Accuracy on Blind test: 0.76 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.05436349 0.06020713 0.06400919 0.07298946 0.06242132 0.05749798 0.06266236 0.07693434 0.07352424 0.07245946] mean value: 0.06570689678192139 key: score_time value: [0.02195573 0.02486134 0.03126049 0.02224779 0.0243659 0.02393746 0.02083087 0.02240777 0.04192734 0.01869297] mean value: 0.02524876594543457 key: test_mcc value: [0.56360186 0.72374686 0.42083333 0.87866878 0.67916667 0.6778302 0.48527095 0.48333333 0.48333333 0.55573827] mean value: 0.5951523594158408 key: train_mcc value: [0.97879618 0.97182532 0.9716269 0.95738969 0.95767878 0.9716269 0.9716269 0.97162977 0.965028 0.97887218] mean value: 0.9696100628980616 key: test_accuracy value: [0.78125 0.84375 0.70967742 0.93548387 0.83870968 0.83870968 0.74193548 0.74193548 0.74193548 0.77419355] mean value: 0.7947580645161291 key: train_accuracy value: [0.98928571 0.98571429 0.98576512 0.97864769 0.97864769 0.98576512 0.98576512 0.98576512 0.98220641 0.98932384] mean value: 0.9846886120996441 key: test_fscore value: [0.77419355 0.81481481 0.70967742 0.93333333 0.83870968 0.84848485 0.71428571 0.73333333 0.73333333 0.74074074] mean value: 0.7840906763487409 key: train_fscore value: [0.98916968 0.98550725 0.98561151 0.97841727 0.97826087 0.98561151 0.98591549 0.98571429 0.98194946 0.98924731] mean value: 0.984540462778581 key: test_precision value: [0.8 1. 0.73333333 1. 0.86666667 0.82352941 0.76923077 0.73333333 0.73333333 0.83333333] mean value: 0.8292760180995475 key: train_precision value: [1. 1. 0.99275362 0.98550725 0.99264706 0.99275362 0.97902098 0.99280576 1. 1. ] mean value: 0.9935488285993815 key: test_recall value: [0.75 0.6875 0.6875 0.875 0.8125 0.875 0.66666667 0.73333333 0.73333333 0.66666667] mean value: 0.74875 key: train_recall value: [0.97857143 0.97142857 0.97857143 0.97142857 0.96428571 0.97857143 0.9929078 0.9787234 0.96453901 0.9787234 ] mean value: 0.975775075987842 key: test_roc_auc value: [0.78125 0.84375 0.71041667 0.9375 0.83958333 0.8375 0.73958333 0.74166667 0.74166667 0.77083333] mean value: 0.794375 key: train_roc_auc value: [0.98928571 0.98571429 0.98573961 0.97862209 0.97859676 0.98573961 0.98573961 0.98579027 0.9822695 0.9893617 ] mean value: 0.9846859169199594 key: test_jcc value: [0.63157895 0.6875 0.55 0.875 0.72222222 0.73684211 0.55555556 0.57894737 0.57894737 0.58823529] mean value: 0.650482886136911 key: train_jcc value: [0.97857143 0.97142857 0.97163121 0.95774648 0.95744681 0.97163121 0.97222222 0.97183099 0.96453901 0.9787234 ] mean value: 0.9695771318216628 MCC on Blind test: 0.51 Accuracy on Blind test: 0.75 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.05900478 0.05510592 0.04103112 0.07902026 0.08206344 0.0536263 0.03792715 0.04140687 0.09598207 0.07943511] mean value: 0.06246030330657959 key: score_time value: [0.02397704 0.0130322 0.0130856 0.02284026 0.02174592 0.0130434 0.01309609 0.01705003 0.02606058 0.02308249] mean value: 0.01870136260986328 key: test_mcc value: [0.19088543 0.25 0.48333333 0.48954403 0.43041423 0.22630095 0.29960206 0.80833333 0.42083333 0.42083333] mean value: 0.40200800428633815 key: train_mcc value: [0.98571429 0.98571429 0.99290744 0.98576494 0.9929078 0.98576494 0.99290744 0.98576494 0.99290744 0.98576494] mean value: 0.9886118480130845 key: test_accuracy value: [0.59375 0.625 0.74193548 0.74193548 0.70967742 0.61290323 0.64516129 0.90322581 0.70967742 0.70967742] mean value: 0.6992943548387097 key: train_accuracy value: [0.99285714 0.99285714 0.99644128 0.99288256 0.99644128 0.99288256 0.99644128 0.99288256 0.99644128 0.99288256] mean value: 0.9943009659379767 key: test_fscore value: [0.62857143 0.625 0.75 0.73333333 0.68965517 0.66666667 0.66666667 0.90322581 0.70967742 0.70967742] mean value: 0.7082473912813179 key: train_fscore value: [0.99285714 0.99285714 0.99641577 0.99285714 0.99644128 0.99285714 0.99646643 0.9929078 0.99646643 0.9929078 ] mean value: 0.9943034088204372 key: test_precision value: [0.57894737 0.625 0.75 0.78571429 0.76923077 0.6 0.61111111 0.875 0.6875 0.6875 ] mean value: 0.6970003534477218 key: train_precision value: [0.99285714 0.99285714 1. 0.99285714 0.9929078 0.99285714 0.99295775 0.9929078 0.99295775 0.9929078 ] mean value: 0.9936067468641637 key: test_recall value: [0.6875 0.625 0.75 0.6875 0.625 0.75 0.73333333 0.93333333 0.73333333 0.73333333] mean value: 0.7258333333333333 key: train_recall value: [0.99285714 0.99285714 0.99285714 0.99285714 1. 0.99285714 1. 0.9929078 1. 0.9929078 ] mean value: 0.9950101317122594 key: test_roc_auc value: [0.59375 0.625 0.74166667 0.74375 0.7125 0.60833333 0.64791667 0.90416667 0.71041667 0.71041667] mean value: 0.6997916666666667 key: train_roc_auc value: [0.99285714 0.99285714 0.99642857 0.99288247 0.9964539 0.99288247 0.99642857 0.99288247 0.99642857 0.99288247] mean value: 0.9942983789260386 key: test_jcc value: [0.45833333 0.45454545 0.6 0.57894737 0.52631579 0.5 0.5 0.82352941 0.55 0.55 ] mean value: 0.554167135753823 key: train_jcc value: [0.9858156 0.9858156 0.99285714 0.9858156 0.9929078 0.9858156 0.99295775 0.98591549 0.99295775 0.98591549] mean value: 0.988677383449634 MCC on Blind test: 0.25 Accuracy on Blind test: 0.63 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.53291035 0.51292086 0.51562953 0.50803471 0.51853633 0.51748157 0.52107215 0.5132122 0.51350141 0.51845551] mean value: 0.5171754598617554 key: score_time value: [0.01048017 0.00943899 0.00925374 0.00941324 0.00959206 0.01012707 0.01015234 0.0094924 0.00954914 0.0099051 ] mean value: 0.009740424156188966 key: test_mcc value: [0.625 0.69991324 0.6125 0.74166667 0.6125 0.6778302 0.55573827 0.54812195 0.48333333 0.61608311] mean value: 0.6172686782122512 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8125 0.84375 0.80645161 0.87096774 0.80645161 0.83870968 0.77419355 0.77419355 0.74193548 0.80645161] mean value: 0.8075604838709677 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8125 0.82758621 0.8125 0.875 0.8125 0.84848485 0.74074074 0.75862069 0.73333333 0.78571429] mean value: 0.8006980104824932 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8125 0.92307692 0.8125 0.875 0.8125 0.82352941 0.83333333 0.78571429 0.73333333 0.84615385] mean value: 0.8257641133376428 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.8125 0.75 0.8125 0.875 0.8125 0.875 0.66666667 0.73333333 0.73333333 0.73333333] mean value: 0.7804166666666666 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8125 0.84375 0.80625 0.87083333 0.80625 0.8375 0.77083333 0.77291667 0.74166667 0.80416667] mean value: 0.8066666666666666 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.68421053 0.70588235 0.68421053 0.77777778 0.68421053 0.73684211 0.58823529 0.61111111 0.57894737 0.64705882] mean value: 0.6698486412108703 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.53 Accuracy on Blind test: 0.76 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.04052234 0.02345443 0.02418399 0.02380633 0.02393818 0.02422571 0.02493715 0.02415442 0.02463698 0.02459621] mean value: 0.0258455753326416 key: score_time value: [0.01229191 0.01268578 0.01460195 0.01458263 0.01477909 0.01717997 0.01701856 0.0148387 0.01225853 0.01511025] mean value: 0.014534735679626464 key: test_mcc value: [0.26967994 0.31814238 0.42083333 0.42083333 0.35445878 0.46159086 0.48954403 0.69203857 0.23012754 0.19266866] mean value: 0.38499174308021394 key: train_mcc value: [0.89802651 0.78050971 0.93120324 0.85402471 0.72348814 0.95136724 0.76926429 0.83529602 0.74016312 0.78697838] mean value: 0.8270321372218536 key: test_accuracy value: [0.625 0.65625 0.70967742 0.70967742 0.67741935 0.70967742 0.74193548 0.83870968 0.61290323 0.58064516] mean value: 0.6861895161290322 key: train_accuracy value: [0.94642857 0.87857143 0.96441281 0.92170819 0.84341637 0.97508897 0.87188612 0.91103203 0.85409253 0.88256228] mean value: 0.9049199288256228 key: test_fscore value: [0.68421053 0.68571429 0.70967742 0.70967742 0.70588235 0.76923077 0.75 0.84848485 0.625 0.64864865] mean value: 0.7136526270045196 key: train_fscore value: [0.94915254 0.89171975 0.96551724 0.92715232 0.86419753 0.97560976 0.88679245 0.91856678 0.87306502 0.8952381 ] mean value: 0.9147011472610135 key: test_precision value: [0.59090909 0.63157895 0.73333333 0.73333333 0.66666667 0.65217391 0.70588235 0.77777778 0.58823529 0.54545455] mean value: 0.662534525494547 key: train_precision value: [0.90322581 0.8045977 0.93333333 0.86419753 0.76086957 0.95238095 0.79661017 0.84939759 0.77472527 0.81034483] mean value: 0.8449682751561366 key: test_recall value: [0.8125 0.75 0.6875 0.6875 0.75 0.9375 0.8 0.93333333 0.66666667 0.8 ] mean value: 0.7825 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.625 0.65625 0.71041667 0.71041667 0.675 0.70208333 0.74375 0.84166667 0.61458333 0.5875 ] mean value: 0.6866666666666666 key: train_roc_auc value: [0.94642857 0.87857143 0.96453901 0.92198582 0.84397163 0.9751773 0.87142857 0.91071429 0.85357143 0.88214286] mean value: 0.9048530901722391 key: test_jcc value: [0.52 0.52173913 0.55 0.55 0.54545455 0.625 0.6 0.73684211 0.45454545 0.48 ] mean value: 0.5583581235697941 key: train_jcc value: [0.90322581 0.8045977 0.93333333 0.86419753 0.76086957 0.95238095 0.79661017 0.84939759 0.77472527 0.81034483] mean value: 0.8449682751561366 MCC on Blind test: 0.16 Accuracy on Blind test: 0.59 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02280617 0.01443529 0.01433778 0.01460624 0.01442766 0.03603244 0.03605747 0.03566337 0.03552485 0.03572798] mean value: 0.025961923599243163 key: score_time value: [0.01775074 0.01187015 0.01184273 0.01180935 0.0117538 0.02118683 0.02436876 0.02252245 0.02214885 0.02132249] mean value: 0.01765761375427246 key: test_mcc value: [0.31311215 0.37796447 0.54812195 0.6125 0.6778302 0.4184137 0.61608311 0.82285074 0.48527095 0.48333333] mean value: 0.5355480607549098 key: train_mcc value: [0.83590622 0.82890983 0.80785208 0.78681467 0.79361702 0.82917933 0.80080045 0.78647416 0.83680633 0.77951762] mean value: 0.808587771026312 key: test_accuracy value: [0.65625 0.6875 0.77419355 0.80645161 0.83870968 0.70967742 0.80645161 0.90322581 0.74193548 0.74193548] mean value: 0.766633064516129 key: train_accuracy value: [0.91785714 0.91428571 0.90391459 0.89323843 0.89679715 0.91459075 0.90035587 0.89323843 0.91814947 0.88967972] mean value: 0.9042107269954245 key: test_fscore value: [0.66666667 0.66666667 0.78787879 0.8125 0.84848485 0.72727273 0.78571429 0.90909091 0.71428571 0.73333333] mean value: 0.7651893939393939 key: train_fscore value: [0.91872792 0.91304348 0.90391459 0.8943662 0.89679715 0.91428571 0.9 0.89361702 0.91986063 0.89122807] mean value: 0.9045840767326006 key: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:136: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:139: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) test_precision value: [0.64705882 0.71428571 0.76470588 0.8125 0.82352941 0.70588235 0.84615385 0.83333333 0.76923077 0.73333333] mean value: 0.7650013466925232 key: train_precision value: [0.90909091 0.92647059 0.90070922 0.88194444 0.89361702 0.91428571 0.90647482 0.89361702 0.90410959 0.88194444] mean value: 0.9012263772097134 key: test_recall value: [0.6875 0.625 0.8125 0.8125 0.875 0.75 0.73333333 1. 0.66666667 0.73333333] mean value: 0.7695833333333333 key: train_recall value: [0.92857143 0.9 0.90714286 0.90714286 0.9 0.91428571 0.89361702 0.89361702 0.93617021 0.90070922] mean value: 0.9081256332320162 key: test_roc_auc value: [0.65625 0.6875 0.77291667 0.80625 0.8375 0.70833333 0.80416667 0.90625 0.73958333 0.74166667] mean value: 0.7660416666666667 key: train_roc_auc value: [0.91785714 0.91428571 0.90392604 0.89328774 0.89680851 0.91458967 0.90037994 0.89323708 0.91808511 0.88964032] mean value: 0.904209726443769 key: test_jcc value: [0.5 0.5 0.65 0.68421053 0.73684211 0.57142857 0.64705882 0.83333333 0.55555556 0.57894737] mean value: 0.6257376283846872 key: train_jcc value: [0.8496732 0.84 0.82467532 0.8089172 0.81290323 0.84210526 0.81818182 0.80769231 0.8516129 0.80379747] mean value: 0.8259558711160642 MCC on Blind test: 0.33 Accuracy on Blind test: 0.67 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.25096083 0.24789667 0.25337172 0.24919391 0.26491046 0.25964808 0.25687599 0.33584547 0.25059628 0.25279713] mean value: 0.2622096538543701 key: score_time value: [0.0240047 0.02082849 0.01891971 0.02080178 0.02297473 0.02297497 0.02118254 0.02063465 0.02154756 0.02372384] mean value: 0.021759295463562013 key: test_mcc value: [0.38729833 0.37796447 0.54812195 0.6125 0.6778302 0.48527095 0.61608311 0.69203857 0.48333333 0.48527095] mean value: 0.5365711870728687 key: train_mcc value: [0.68683657 0.82890983 0.68713898 0.78681467 0.68682877 0.69395613 0.65160982 0.65141613 0.70131788 0.72326575] mean value: 0.7098094526353244 key: test_accuracy value: [0.6875 0.6875 0.77419355 0.80645161 0.83870968 0.74193548 0.80645161 0.83870968 0.74193548 0.74193548] mean value: 0.7665322580645161 key: train_accuracy value: [0.84285714 0.91428571 0.84341637 0.89323843 0.84341637 0.84697509 0.82562278 0.82562278 0.85053381 0.86120996] mean value: 0.8547178444331469 key: test_fscore value: [0.72222222 0.66666667 0.78787879 0.8125 0.84848485 0.76470588 0.78571429 0.84848485 0.73333333 0.71428571] mean value: 0.7684276589423649 key: train_fscore value: [0.84722222 0.91304348 0.84507042 0.8943662 0.84285714 0.84587814 0.82926829 0.82437276 0.85314685 0.8650519 ] mean value: 0.8560277408059859 key: test_precision value: [0.65 0.71428571 0.76470588 0.8125 0.82352941 0.72222222 0.84615385 0.77777778 0.73333333 0.76923077] mean value: 0.761373895712131 key: train_precision value: [0.82432432 0.92647059 0.83333333 0.88194444 0.84285714 0.84892086 0.81506849 0.83333333 0.84137931 0.84459459] mean value: 0.8492226427927332 key: test_recall value: [0.8125 0.625 0.8125 0.8125 0.875 0.8125 0.73333333 0.93333333 0.73333333 0.66666667] mean value: 0.7816666666666666 key: train_recall value: [0.87142857 0.9 0.85714286 0.90714286 0.84285714 0.84285714 0.84397163 0.81560284 0.86524823 0.88652482] mean value: 0.8632776089159068 key: test_roc_auc value: [0.6875 0.6875 0.77291667 0.80625 0.8375 0.73958333 0.80416667 0.84166667 0.74166667 0.73958333] mean value: 0.7658333333333334 key: train_roc_auc value: [0.84285714 0.91428571 0.84346505 0.89328774 0.84341439 0.84696049 0.82555724 0.82565856 0.85048126 0.86111955] mean value: 0.854708713272543 key: test_jcc value: [0.56521739 0.5 0.65 0.68421053 0.73684211 0.61904762 0.64705882 0.73684211 0.57894737 0.55555556] mean value: 0.6273721494700092 key: train_jcc value: [0.73493976 0.84 0.73170732 0.8089172 0.72839506 0.73291925 0.70833333 0.70121951 0.74390244 0.76219512] mean value: 0.749252899645239 MCC on Blind test: 0.41 Accuracy on Blind test: 0.71 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.02953863 0.03505015 0.03531361 0.03359294 0.0331347 0.03402448 0.03351879 0.03452086 0.04119611 0.02893472] mean value: 0.0338824987411499 key: score_time value: [0.01170063 0.0165534 0.01207328 0.01180673 0.0122366 0.01206207 0.01204228 0.01204181 0.01195955 0.01177216] mean value: 0.012424850463867187 key: test_mcc value: [0.438357 0.438357 0.48333333 0.6778302 0.48527095 0.48333333 0.68826048 0.69203857 0.48527095 0.42083333] mean value: 0.5292885147430821 key: train_mcc value: [0.72144698 0.72864578 0.75089924 0.70836501 0.70111973 0.71556015 0.70106383 0.70836501 0.71535695 0.75089924] mean value: 0.7201721914786785 key: test_accuracy value: [0.71875 0.71875 0.74193548 0.83870968 0.74193548 0.74193548 0.83870968 0.83870968 0.74193548 0.70967742] mean value: 0.7631048387096775 key: train_accuracy value: [0.86071429 0.86428571 0.87544484 0.85409253 0.85053381 0.85765125 0.85053381 0.85409253 0.85765125 0.87544484] mean value: 0.8600444839857652 key: test_fscore value: [0.72727273 0.70967742 0.75 0.84848485 0.76470588 0.75 0.81481481 0.84848485 0.71428571 0.70967742] mean value: 0.7637403674405572 key: train_fscore value: [0.86120996 0.86524823 0.87455197 0.85512367 0.84892086 0.85507246 0.85106383 0.85304659 0.85915493 0.87632509] mean value: 0.859971760736446 key: test_precision value: [0.70588235 0.73333333 0.75 0.82352941 0.72222222 0.75 0.91666667 0.77777778 0.76923077 0.6875 ] mean value: 0.7636142533936652 key: train_precision value: [0.85815603 0.85915493 0.87769784 0.84615385 0.85507246 0.86764706 0.85106383 0.86231884 0.85314685 0.87323944] mean value: 0.8603651128551885 key: test_recall value: [0.75 0.6875 0.75 0.875 0.8125 0.75 0.73333333 0.93333333 0.66666667 0.73333333] mean value: 0.7691666666666667 key: train_recall value: [0.86428571 0.87142857 0.87142857 0.86428571 0.84285714 0.84285714 0.85106383 0.84397163 0.86524823 0.87943262] mean value: 0.8596859169199595 key: test_roc_auc value: [0.71875 0.71875 0.74166667 0.8375 0.73958333 0.74166667 0.83541667 0.84166667 0.73958333 0.71041667] mean value: 0.7625 key: train_roc_auc value: [0.86071429 0.86428571 0.8754306 0.85412867 0.85050659 0.85759878 0.85053191 0.85412867 0.85762411 0.8754306 ] mean value: 0.8600379939209727 key: test_jcc value: [0.57142857 0.55 0.6 0.73684211 0.61904762 0.6 0.6875 0.73684211 0.55555556 0.55 ] mean value: 0.6207215956558062 key: train_jcc value: [0.75625 0.7625 0.77707006 0.74691358 0.7375 0.74683544 0.74074074 0.74375 0.75308642 0.77987421] mean value: 0.7544520461309461 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.92847991 0.77967024 0.77496982 0.87230301 0.79569674 0.93908501 0.77330136 0.76536059 0.965801 0.76891041] mean value: 0.8363578081130981 key: score_time value: [0.01461387 0.01185727 0.01436329 0.01196814 0.01442528 0.01206875 0.01191068 0.01198006 0.01440215 0.01197433] mean value: 0.012956380844116211 key: test_mcc value: [0.12909944 0.37796447 0.4184137 0.74689528 0.48527095 0.4184137 0.74689528 0.63696156 0.46159086 0.48333333] mean value: 0.4904838594649526 key: train_mcc value: [0.89342711 0.64390928 0.80070922 0.63786232 0.77937079 0.67973658 0.57566395 0.63712378 0.9929078 0.77938197] mean value: 0.7420092792860252 key: test_accuracy value: [0.5625 0.6875 0.70967742 0.87096774 0.74193548 0.70967742 0.87096774 0.80645161 0.70967742 0.74193548] mean value: 0.7411290322580645 key: train_accuracy value: [0.94642857 0.82142857 0.90035587 0.81850534 0.88967972 0.83985765 0.78647687 0.81850534 0.99644128 0.88967972] mean value: 0.8707358922216574 key: test_fscore value: [0.61111111 0.66666667 0.72727273 0.88235294 0.76470588 0.72727273 0.85714286 0.82352941 0.60869565 0.73333333] mean value: 0.7402083310267453 key: train_fscore value: [0.94736842 0.82638889 0.9 0.82229965 0.88888889 0.83985765 0.7972973 0.82105263 0.99644128 0.88967972] mean value: 0.8729274426961431 key: test_precision value: [0.55 0.71428571 0.70588235 0.83333333 0.72222222 0.70588235 0.92307692 0.73684211 0.875 0.73333333] mean value: 0.7499858337397037 key: train_precision value: [0.93103448 0.80405405 0.9 0.80272109 0.89208633 0.83687943 0.76129032 0.8125 1. 0.89285714] mean value: 0.8633422854245202 key: test_recall value: [0.6875 0.625 0.75 0.9375 0.8125 0.75 0.8 0.93333333 0.46666667 0.73333333] mean value: 0.7495833333333334 key: train_recall value: [0.96428571 0.85 0.9 0.84285714 0.88571429 0.84285714 0.83687943 0.82978723 0.9929078 0.88652482] mean value: 0.8831813576494427 key: test_roc_auc value: [0.5625 0.6875 0.70833333 0.86875 0.73958333 0.70833333 0.86875 0.81041667 0.70208333 0.74166667] mean value: 0.7397916666666666 key: train_roc_auc value: [0.94642857 0.82142857 0.90035461 0.81859169 0.88966565 0.83986829 0.78629686 0.81846505 0.9964539 0.88969098] mean value: 0.8707244174265452 key: test_jcc value: [0.44 0.5 0.57142857 0.78947368 0.61904762 0.57142857 0.75 0.7 0.4375 0.57894737] mean value: 0.595782581453634 key: train_jcc value: [0.9 0.70414201 0.81818182 0.69822485 0.8 0.72392638 0.66292135 0.69642857 0.9929078 0.80128205] mean value: 0.7798014834898911 MCC on Blind test: 0.42 Accuracy on Blind test: 0.71 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01343632 0.01282287 0.00962019 0.00936198 0.00911808 0.00918531 0.00927401 0.00933409 0.0091629 0.00921702] mean value: 0.010053277015686035 key: score_time value: [0.01177359 0.00929523 0.00898552 0.0086925 0.00870514 0.00878668 0.00875974 0.00862837 0.00862598 0.00870895] mean value: 0.009096169471740722 key: test_mcc value: [0.40451992 0.19088543 0.29844172 0.48527095 0.46159086 0.42321607 0.48527095 0.53006813 0.55 0.42083333] mean value: 0.4250097354571631 key: train_mcc value: [0.47497405 0.48038446 0.52044165 0.4622929 0.44775571 0.54142123 0.52339686 0.51032449 0.45970151 0.49577468] mean value: 0.4916467544879524 key: test_accuracy value: [0.6875 0.59375 0.64516129 0.74193548 0.70967742 0.70967742 0.74193548 0.74193548 0.77419355 0.70967742] mean value: 0.7055443548387097 key: train_accuracy value: [0.72857143 0.725 0.7544484 0.72597865 0.71886121 0.76512456 0.76156584 0.74733096 0.72241993 0.74377224] mean value: 0.7393073207930859 key: test_fscore value: [0.73684211 0.62857143 0.7027027 0.76470588 0.76923077 0.74285714 0.71428571 0.77777778 0.77419355 0.70967742] mean value: 0.732084449078357 key: train_fscore value: [0.76100629 0.76595745 0.77669903 0.75080906 0.74433657 0.78571429 0.76655052 0.77602524 0.75471698 0.76623377] mean value: 0.7648049188632132 key: test_precision value: [0.63636364 0.57894737 0.61904762 0.72222222 0.65217391 0.68421053 0.76923077 0.66666667 0.75 0.6875 ] mean value: 0.6766362721311234 key: train_precision value: [0.67977528 0.66666667 0.71005917 0.68639053 0.68047337 0.7202381 0.75342466 0.69886364 0.6779661 0.70658683] mean value: 0.6980444341666818 key: test_recall value: [0.875 0.6875 0.8125 0.8125 0.9375 0.8125 0.66666667 0.93333333 0.8 0.73333333] mean value: 0.8070833333333334 key: train_recall value: [0.86428571 0.9 0.85714286 0.82857143 0.82142857 0.86428571 0.78014184 0.87234043 0.85106383 0.83687943] mean value: 0.8476139817629179 key: test_roc_auc value: [0.6875 0.59375 0.63958333 0.73958333 0.70208333 0.70625 0.73958333 0.74791667 0.775 0.71041667] mean value: 0.7041666666666666 key: train_roc_auc value: [0.72857143 0.725 0.75481256 0.72634245 0.71922492 0.76547619 0.76149949 0.7468845 0.72196049 0.74343972] mean value: 0.7393211752786222 key: test_jcc value: [0.58333333 0.45833333 0.54166667 0.61904762 0.625 0.59090909 0.55555556 0.63636364 0.63157895 0.55 ] mean value: 0.5791788182577656 key: train_jcc value: [0.6142132 0.62068966 0.63492063 0.60103627 0.59278351 0.64705882 0.62146893 0.63402062 0.60606061 0.62105263] mean value: 0.6193304868926621 MCC on Blind test: 0.28 Accuracy on Blind test: 0.65 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00932527 0.00929976 0.00981283 0.00938082 0.01028895 0.00962377 0.0106163 0.00959873 0.00959492 0.0094955 ] mean value: 0.009703683853149413 key: score_time value: [0.00870562 0.00866961 0.00908709 0.00865245 0.00882196 0.00876093 0.00897956 0.00915599 0.00876236 0.00909448] mean value: 0.008869004249572755 key: test_mcc value: [0.12909944 0.50395263 0.6125 0.4184137 0.51837044 0.55 0.4184137 0.44824996 0.35983579 0.6125 ] mean value: 0.4571335673572818 key: train_mcc value: [0.56603562 0.52531582 0.53306083 0.57396568 0.5248687 0.56166311 0.53352641 0.52595275 0.56002251 0.54704118] mean value: 0.5451452600049477 key: test_accuracy value: [0.5625 0.75 0.80645161 0.70967742 0.74193548 0.77419355 0.70967742 0.70967742 0.67741935 0.80645161] mean value: 0.7247983870967742 key: train_accuracy value: [0.78214286 0.76071429 0.76512456 0.78647687 0.76156584 0.77935943 0.76512456 0.76156584 0.77935943 0.77224199] mean value: 0.7713675648195221 key: test_fscore value: [0.61111111 0.73333333 0.8125 0.72727273 0.78947368 0.77419355 0.68965517 0.74285714 0.6875 0.8 ] mean value: 0.7367896719585731 key: train_fscore value: [0.79037801 0.77441077 0.7755102 0.79166667 0.76975945 0.78911565 0.77852349 0.77441077 0.78767123 0.78378378] mean value: 0.7815230029466407 key: test_precision value: [0.55 0.78571429 0.8125 0.70588235 0.68181818 0.8 0.71428571 0.65 0.64705882 0.8 ] mean value: 0.714725935828877 key: train_precision value: [0.7615894 0.73248408 0.74025974 0.77027027 0.74172185 0.75324675 0.7388535 0.73717949 0.7615894 0.7483871 ] mean value: 0.7485581589599934 key: test_recall value: [0.6875 0.6875 0.8125 0.75 0.9375 0.75 0.66666667 0.86666667 0.73333333 0.8 ] mean value: 0.7691666666666667 key: train_recall value: [0.82142857 0.82142857 0.81428571 0.81428571 0.8 0.82857143 0.82269504 0.81560284 0.81560284 0.82269504] mean value: 0.8176595744680851 key: test_roc_auc value: [0.5625 0.75 0.80625 0.70833333 0.73541667 0.775 0.70833333 0.71458333 0.67916667 0.80625 ] mean value: 0.7245833333333334 key: train_roc_auc value: [0.78214286 0.76071429 0.76529889 0.78657548 0.76170213 0.77953394 0.76491895 0.76137285 0.77922999 0.7720618 ] mean value: 0.771355116514691 key: test_jcc value: [0.44 0.57894737 0.68421053 0.57142857 0.65217391 0.63157895 0.52631579 0.59090909 0.52380952 0.66666667] mean value: 0.5866040397436278 key: train_jcc value: [0.65340909 0.63186813 0.63333333 0.65517241 0.62569832 0.65168539 0.63736264 0.63186813 0.64971751 0.64444444] mean value: 0.641455941498394 MCC on Blind test: 0.35 Accuracy on Blind test: 0.68 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00917029 0.00897074 0.00953841 0.01000524 0.00936627 0.01002192 0.01007271 0.01017094 0.00896072 0.00896215] mean value: 0.009523940086364747 key: score_time value: [0.011343 0.01217747 0.01493287 0.01539063 0.01501703 0.01552916 0.01544309 0.01240277 0.01429534 0.01415873] mean value: 0.014069008827209472 key: test_mcc value: [0.19738551 0.19738551 0.1784296 0.29166667 0.09283444 0.35416667 0.22630095 0.43041423 0.4184137 0.36121114] mean value: 0.27482084138853546 key: train_mcc value: [0.5503511 0.55068879 0.5873797 0.55159836 0.56592685 0.58718338 0.59672888 0.5873797 0.55168747 0.57313743] mean value: 0.5702061665452179 key: test_accuracy value: [0.59375 0.59375 0.58064516 0.64516129 0.5483871 0.67741935 0.61290323 0.70967742 0.70967742 0.67741935] mean value: 0.6348790322580645 key: train_accuracy value: [0.775 0.775 0.79359431 0.77580071 0.78291815 0.79359431 0.79715302 0.79359431 0.77580071 0.78647687] mean value: 0.7848932384341637 key: test_fscore value: [0.64864865 0.51851852 0.51851852 0.64516129 0.58823529 0.6875 0.53846154 0.72727273 0.68965517 0.61538462] mean value: 0.6177356323658587 key: train_fscore value: [0.77894737 0.7804878 0.78985507 0.77419355 0.77978339 0.79285714 0.80677966 0.7972028 0.77894737 0.79020979] mean value: 0.7869263947359504 key: test_precision value: [0.57142857 0.63636364 0.63636364 0.66666667 0.55555556 0.6875 0.63636364 0.66666667 0.71428571 0.72727273] mean value: 0.6498466810966811 key: train_precision value: [0.76551724 0.76190476 0.80147059 0.77697842 0.78832117 0.79285714 0.77272727 0.7862069 0.77083333 0.77931034] mean value: 0.7796127166965825 key: test_recall value: [0.75 0.4375 0.4375 0.625 0.625 0.6875 0.46666667 0.8 0.66666667 0.53333333] mean value: 0.6029166666666667 key: train_recall value: [0.79285714 0.8 0.77857143 0.77142857 0.77142857 0.79285714 0.84397163 0.80851064 0.78723404 0.80141844] mean value: 0.7948277608915907 key: test_roc_auc value: [0.59375 0.59375 0.58541667 0.64583333 0.54583333 0.67708333 0.60833333 0.7125 0.70833333 0.67291667] mean value: 0.634375 key: train_roc_auc value: [0.775 0.775 0.79354103 0.77578521 0.78287741 0.79359169 0.79698582 0.79354103 0.77575988 0.78642351] mean value: 0.7848505572441743 key: test_jcc value: [0.48 0.35 0.35 0.47619048 0.41666667 0.52380952 0.36842105 0.57142857 0.52631579 0.44444444] mean value: 0.45072765246449453 key: train_jcc value: [0.63793103 0.64 0.65269461 0.63157895 0.63905325 0.65680473 0.67613636 0.6627907 0.63793103 0.65317919] mean value: 0.6488099867340289 MCC on Blind test: 0.13 Accuracy on Blind test: 0.56 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01388001 0.01374197 0.01386142 0.01398516 0.01397181 0.01414371 0.01401997 0.01408243 0.01399183 0.01399088] mean value: 0.013966917991638184 key: score_time value: [0.01013136 0.0099895 0.00989628 0.01076245 0.009938 0.01009417 0.00990272 0.00999641 0.01002407 0.01006293] mean value: 0.010079789161682128 key: test_mcc value: [0.26967994 0.438357 0.54812195 0.6778302 0.36121114 0.4184137 0.68826048 0.69203857 0.48954403 0.48333333] mean value: 0.5066790355877698 key: train_mcc value: [0.68599434 0.75001913 0.73708513 0.73015914 0.68713898 0.7388473 0.71590892 0.71561775 0.72244174 0.74385734] mean value: 0.7227069780548997 key: test_accuracy value: [0.625 0.71875 0.77419355 0.83870968 0.67741935 0.70967742 0.83870968 0.83870968 0.74193548 0.74193548] mean value: 0.7505040322580645 key: train_accuracy value: [0.84285714 0.875 0.8683274 0.86476868 0.84341637 0.8683274 0.85765125 0.85765125 0.86120996 0.87188612] mean value: 0.8611095577020844 key: test_fscore value: [0.68421053 0.70967742 0.78787879 0.84848485 0.72222222 0.72727273 0.81481481 0.84848485 0.75 0.73333333] mean value: 0.762637952816221 key: train_fscore value: [0.84507042 0.87455197 0.86545455 0.86131387 0.84507042 0.86245353 0.86111111 0.85611511 0.86120996 0.87142857] mean value: 0.8603779516928948 key: test_precision value: [0.59090909 0.73333333 0.76470588 0.82352941 0.65 0.70588235 0.91666667 0.77777778 0.70588235 0.73333333] mean value: 0.7402020202020202 key: train_precision value: [0.83333333 0.87769784 0.88148148 0.88059701 0.83333333 0.89922481 0.84353741 0.86861314 0.86428571 0.87769784] mean value: 0.8659801920666141 key: test_recall value: [0.8125 0.6875 0.8125 0.875 0.8125 0.75 0.73333333 0.93333333 0.8 0.73333333] mean value: 0.795 key: train_recall value: [0.85714286 0.87142857 0.85 0.84285714 0.85714286 0.82857143 0.87943262 0.84397163 0.85815603 0.86524823] mean value: 0.8553951367781155 key: test_roc_auc value: [0.625 0.71875 0.77291667 0.8375 0.67291667 0.70833333 0.83541667 0.84166667 0.74375 0.74166667] mean value: 0.7497916666666666 key: train_roc_auc value: [0.84285714 0.875 0.86826241 0.86469098 0.84346505 0.86818642 0.85757345 0.8577001 0.86122087 0.87190983] mean value: 0.8610866261398177 key: test_jcc value: [0.52 0.55 0.65 0.73684211 0.56521739 0.57142857 0.6875 0.73684211 0.6 0.57894737] mean value: 0.6196777541680287 key: train_jcc value: [0.73170732 0.77707006 0.76282051 0.75641026 0.73170732 0.75816993 0.75609756 0.74842767 0.75625 0.7721519 ] mean value: 0.7550812534377662 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.23191261 1.13450933 1.40175247 1.27486777 1.25217056 1.13011384 1.22160625 1.20618701 1.18603587 1.25448847] mean value: 1.2293644189834594 key: score_time value: [0.01473403 0.01438618 0.01480508 0.01451826 0.01454473 0.01558375 0.0150125 0.0203166 0.01496863 0.01508808] mean value: 0.015395784378051757 key: test_mcc value: [0.25197632 0.56360186 0.54812195 0.6125 0.67916667 0.48527095 0.29166667 0.69203857 0.35416667 0.61925228] mean value: 0.5097761926765115 key: train_mcc value: [0.98571429 0.97152771 0.99290744 0.99290744 0.98576494 0.99290744 0.99290744 0.98576494 0.98576494 0.99290744] mean value: 0.987907404747804 key: test_accuracy value: [0.625 0.78125 0.77419355 0.80645161 0.83870968 0.74193548 0.64516129 0.83870968 0.67741935 0.80645161] mean value: 0.7535282258064516 key: train_accuracy value: [0.99285714 0.98571429 0.99644128 0.99644128 0.99288256 0.99644128 0.99644128 0.99288256 0.99288256 0.99644128] mean value: 0.993942552109812 key: test_fscore value: [0.64705882 0.77419355 0.78787879 0.8125 0.83870968 0.76470588 0.64516129 0.84848485 0.66666667 0.8125 ] mean value: 0.7597859525041688 key: train_fscore value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [0.99285714 0.98561151 0.99641577 0.99641577 0.99285714 0.99641577 0.99646643 0.9929078 0.9929078 0.99646643] mean value: 0.9939321573361302 key: test_precision value: [0.61111111 0.8 0.76470588 0.8125 0.86666667 0.72222222 0.625 0.77777778 0.66666667 0.76470588] mean value: 0.7411356209150327 key: train_precision value: [0.99285714 0.99275362 1. 1. 0.99285714 1. 0.99295775 0.9929078 0.9929078 0.99295775] mean value: 0.9950199004697318 key: test_recall value: [0.6875 0.75 0.8125 0.8125 0.8125 0.8125 0.66666667 0.93333333 0.66666667 0.86666667] mean value: 0.7820833333333334 key: train_recall value: [0.99285714 0.97857143 0.99285714 0.99285714 0.99285714 0.99285714 1. 0.9929078 0.9929078 1. ] mean value: 0.9928672745694023 key: test_roc_auc value: [0.625 0.78125 0.77291667 0.80625 0.83958333 0.73958333 0.64583333 0.84166667 0.67708333 0.80833333] mean value: 0.75375 key: train_roc_auc value: [0.99285714 0.98571429 0.99642857 0.99642857 0.99288247 0.99642857 0.99642857 0.99288247 0.99288247 0.99642857] mean value: 0.993936170212766 key: test_jcc value: [0.47826087 0.63157895 0.65 0.68421053 0.72222222 0.61904762 0.47619048 0.73684211 0.5 0.68421053] mean value: 0.6182563292288693 key: train_jcc value: [0.9858156 0.97163121 0.99285714 0.99285714 0.9858156 0.99285714 0.99295775 0.98591549 0.98591549 0.99295775] mean value: 0.9879580318792186 MCC on Blind test: 0.38 Accuracy on Blind test: 0.69 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02856255 0.02037883 0.02034307 0.01903844 0.02007008 0.01872396 0.01928687 0.01898861 0.02181172 0.02207255] mean value: 0.02092766761779785 key: score_time value: [0.01176023 0.00898933 0.00875068 0.00902557 0.00892735 0.00873947 0.00879765 0.00876856 0.00897741 0.00882697] mean value: 0.009156322479248047 key: test_mcc value: [0.44539933 0.81409158 0.44824996 0.6778302 0.48333333 0.71269665 0.48527095 0.61925228 0.48954403 0.42083333] mean value: 0.5596501652767185 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.71875 0.90625 0.70967742 0.83870968 0.74193548 0.83870968 0.74193548 0.80645161 0.74193548 0.70967742] mean value: 0.7754032258064516 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.74285714 0.90322581 0.66666667 0.84848485 0.75 0.86486486 0.71428571 0.8125 0.75 0.70967742] mean value: 0.7762562462965689 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.68421053 0.93333333 0.81818182 0.82352941 0.75 0.76190476 0.76923077 0.76470588 0.70588235 0.6875 ] mean value: 0.7698478856025296 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.8125 0.875 0.5625 0.875 0.75 1. 0.66666667 0.86666667 0.8 0.73333333] mean value: 0.7941666666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.71875 0.90625 0.71458333 0.8375 0.74166667 0.83333333 0.73958333 0.80833333 0.74375 0.71041667] mean value: 0.7754166666666666 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.59090909 0.82352941 0.5 0.73684211 0.6 0.76190476 0.55555556 0.68421053 0.6 0.55 ] mean value: 0.6402951451713061 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.10739708 0.1049428 0.11385632 0.10848141 0.10530806 0.10459304 0.10556197 0.10774064 0.10547471 0.10502815] mean value: 0.10683841705322265 key: score_time value: [0.01761246 0.01935506 0.01902342 0.01758337 0.01857352 0.0177691 0.01770043 0.01812339 0.01769495 0.01809072] mean value: 0.018152642250061034 key: test_mcc value: [0.44539933 0.50395263 0.6125 0.55 0.67916667 0.29069387 0.54812195 0.61925228 0.35983579 0.35983579] mean value: 0.4968758304596963 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.71875 0.75 0.80645161 0.77419355 0.83870968 0.64516129 0.77419355 0.80645161 0.67741935 0.67741935] mean value: 0.746875 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.74285714 0.73333333 0.8125 0.77419355 0.83870968 0.68571429 0.75862069 0.8125 0.6875 0.6875 ] mean value: 0.7533428677366386 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.68421053 0.78571429 0.8125 0.8 0.86666667 0.63157895 0.78571429 0.76470588 0.64705882 0.64705882] mean value: 0.7425208241191213 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.8125 0.6875 0.8125 0.75 0.8125 0.75 0.73333333 0.86666667 0.73333333 0.73333333] mean value: 0.7691666666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.71875 0.75 0.80625 0.775 0.83958333 0.64166667 0.77291667 0.80833333 0.67916667 0.67916667] mean value: 0.7470833333333333 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.59090909 0.57894737 0.68421053 0.63157895 0.72222222 0.52173913 0.61111111 0.68421053 0.52380952 0.52380952] mean value: 0.6072547970717307 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.3 Accuracy on Blind test: 0.66 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00956964 0.00989819 0.00956774 0.009583 0.00953579 0.010221 0.0096035 0.00991416 0.00965524 0.00966287] mean value: 0.009721112251281739 key: score_time value: [0.00921845 0.00870037 0.00887227 0.00867987 0.00864482 0.00938559 0.00907087 0.0090909 0.0087533 0.00865674] mean value: 0.008907318115234375 key: test_mcc value: [0.37796447 0.32897585 0.28870546 0.43041423 0.54812195 0.55 0.36121114 0.55573827 0.68826048 0.35416667] mean value: 0.44835585149333823 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.6875 0.65625 0.64516129 0.70967742 0.77419355 0.77419355 0.67741935 0.77419355 0.83870968 0.67741935] mean value: 0.7214717741935484 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.59259259 0.66666667 0.68965517 0.78787879 0.77419355 0.61538462 0.74074074 0.81481481 0.66666667] mean value: 0.7015260272212441 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.71428571 0.72727273 0.64705882 0.76923077 0.76470588 0.8 0.72727273 0.83333333 0.91666667 0.66666667] mean value: 0.7566493310610958 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.625 0.5 0.6875 0.625 0.8125 0.75 0.53333333 0.66666667 0.73333333 0.66666667] mean value: 0.66 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.6875 0.65625 0.64375 0.7125 0.77291667 0.775 0.67291667 0.77083333 0.83541667 0.67708333] mean value: 0.7204166666666667 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.42105263 0.5 0.52631579 0.65 0.63157895 0.44444444 0.58823529 0.6875 0.5 ] mean value: 0.5449127106983144 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.09 Accuracy on Blind test: 0.55 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.53831029 1.45324683 1.46263885 1.45239115 1.44685459 1.45993972 1.44669032 1.45897341 1.45301151 1.46730351] mean value: 1.4639360189437867 key: score_time value: [0.09898853 0.09832716 0.09827876 0.09174919 0.09818649 0.09863758 0.09640765 0.09183216 0.0928421 0.0926671 ] mean value: 0.09579167366027833 key: test_mcc value: [0.625 0.51639778 0.6778302 0.74896053 0.67916667 0.48527095 0.68826048 0.69203857 0.42083333 0.49612132] mean value: 0.6029879822988455 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8125 0.75 0.83870968 0.87096774 0.83870968 0.74193548 0.83870968 0.83870968 0.70967742 0.74193548] mean value: 0.7981854838709678 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8125 0.71428571 0.84848485 0.86666667 0.83870968 0.76470588 0.81481481 0.84848485 0.70967742 0.69230769] mean value: 0.791063756417172 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8125 0.83333333 0.82352941 0.92857143 0.86666667 0.72222222 0.91666667 0.77777778 0.6875 0.81818182] mean value: 0.8186949325184619 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.8125 0.625 0.875 0.8125 0.8125 0.8125 0.73333333 0.93333333 0.73333333 0.6 ] mean value: 0.775 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8125 0.75 0.8375 0.87291667 0.83958333 0.73958333 0.83541667 0.84166667 0.71041667 0.7375 ] mean value: 0.7977083333333334 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [0.68421053 0.55555556 0.73684211 0.76470588 0.72222222 0.61904762 0.6875 0.73684211 0.55 0.52941176] mean value: 0.6586337780726326 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.43 Accuracy on Blind test: 0.72 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.89998579 1.01772881 0.89380383 0.91389751 0.90953159 0.89730668 0.92596483 0.93635964 0.91237187 1.07121158] mean value: 0.9378162145614624 key: score_time value: [0.24249458 0.22559667 0.13978267 0.23898888 0.2287128 0.2048564 0.23368526 0.22588706 0.22489595 0.18387413] mean value: 0.21487743854522706 key: test_mcc value: [0.438357 0.59215653 0.61608311 0.74166667 0.67916667 0.54812195 0.71269665 0.69203857 0.42083333 0.57461167] mean value: 0.6015732146319642 key: train_mcc value: [0.87859384 0.9 0.87902736 0.88611955 0.90768608 0.91467803 0.91467803 0.89325701 0.90749747 0.90035461] mean value: 0.8981891969122482 key: test_accuracy value: [0.71875 0.78125 0.80645161 0.87096774 0.83870968 0.77419355 0.83870968 0.83870968 0.70967742 0.77419355] mean value: 0.7951612903225806 key: train_accuracy value: [0.93928571 0.95 0.93950178 0.9430605 0.95373665 0.95729537 0.95729537 0.94661922 0.95373665 0.95017794] mean value: 0.9490709201830199 key: test_fscore value: [0.72727273 0.74074074 0.82352941 0.875 0.83870968 0.78787879 0.8 0.84848485 0.70967742 0.72 ] mean value: 0.7871293612916004 key: train_fscore value: [0.93950178 0.95 0.93950178 0.94285714 0.9540636 0.95683453 0.95774648 0.94699647 0.95373665 0.95035461] mean value: 0.9491593048228071 key: test_precision value: [0.70588235 0.90909091 0.77777778 0.875 0.86666667 0.76470588 1. 0.77777778 0.6875 0.9 ] mean value: 0.8264401366607249 key: train_precision value: [0.93617021 0.95 0.93617021 0.94285714 0.94405594 0.96376812 0.95104895 0.94366197 0.95714286 0.95035461] mean value: 0.9475230018338903 key: test_recall value: [0.75 0.625 0.875 0.875 0.8125 0.8125 0.66666667 0.93333333 0.73333333 0.6 ] mean value: 0.7683333333333333 key: train_recall value: [0.94285714 0.95 0.94285714 0.94285714 0.96428571 0.95 0.96453901 0.95035461 0.95035461 0.95035461] mean value: 0.9508459979736575 key: test_roc_auc value: [0.71875 0.78125 0.80416667 0.87083333 0.83958333 0.77291667 0.83333333 0.84166667 0.71041667 0.76875 ] mean value: 0.7941666666666667 key: train_roc_auc value: [0.93928571 0.95 0.93951368 0.94305978 0.95377406 0.9572695 0.9572695 0.94660588 0.95374873 0.9501773 ] mean value: 0.9490704154002026 key: test_jcc value: [0.57142857 0.58823529 0.7 0.77777778 0.72222222 0.65 0.66666667 0.73684211 0.55 0.5625 ] mean value: 0.6525672637476043 key: train_jcc value: [0.88590604 0.9047619 0.88590604 0.89189189 0.91216216 0.91724138 0.91891892 0.89932886 0.91156463 0.90540541] mean value: 0.9033087227898283 MCC on Blind test: 0.48 Accuracy on Blind test: 0.74 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02231336 0.00941491 0.00936151 0.01038527 0.00926423 0.00929189 0.0093472 0.00929093 0.00926256 0.00975227] mean value: 0.010768413543701172 key: score_time value: [0.0153296 0.00867748 0.00866079 0.00852895 0.00856781 0.00855899 0.00863695 0.00885201 0.00861931 0.00857711] mean value: 0.009300899505615235 key: test_mcc value: [0.12909944 0.50395263 0.6125 0.4184137 0.51837044 0.55 0.4184137 0.44824996 0.35983579 0.6125 ] mean value: 0.4571335673572818 key: train_mcc value: [0.56603562 0.52531582 0.53306083 0.57396568 0.5248687 0.56166311 0.53352641 0.52595275 0.56002251 0.54704118] mean value: 0.5451452600049477 key: test_accuracy value: [0.5625 0.75 0.80645161 0.70967742 0.74193548 0.77419355 0.70967742 0.70967742 0.67741935 0.80645161] mean value: 0.7247983870967742 key: train_accuracy value: [0.78214286 0.76071429 0.76512456 0.78647687 0.76156584 0.77935943 0.76512456 0.76156584 0.77935943 0.77224199] mean value: 0.7713675648195221 key: test_fscore value: [0.61111111 0.73333333 0.8125 0.72727273 0.78947368 0.77419355 0.68965517 0.74285714 0.6875 0.8 ] mean value: 0.7367896719585731 key: train_fscore value: [0.79037801 0.77441077 0.7755102 0.79166667 0.76975945 0.78911565 0.77852349 0.77441077 0.78767123 0.78378378] mean value: 0.7815230029466407 key: test_precision value: [0.55 0.78571429 0.8125 0.70588235 0.68181818 0.8 0.71428571 0.65 0.64705882 0.8 ] mean value: 0.714725935828877 key: train_precision value: [0.7615894 0.73248408 0.74025974 0.77027027 0.74172185 0.75324675 0.7388535 0.73717949 0.7615894 0.7483871 ] mean value: 0.7485581589599934 key: test_recall value: [0.6875 0.6875 0.8125 0.75 0.9375 0.75 0.66666667 0.86666667 0.73333333 0.8 ] mean value: 0.7691666666666667 key: train_recall value: [0.82142857 0.82142857 0.81428571 0.81428571 0.8 0.82857143 0.82269504 0.81560284 0.81560284 0.82269504] mean value: 0.8176595744680851 key: test_roc_auc value: [0.5625 0.75 0.80625 0.70833333 0.73541667 0.775 0.70833333 0.71458333 0.67916667 0.80625 ] mean value: 0.7245833333333334 key: train_roc_auc value: [0.78214286 0.76071429 0.76529889 0.78657548 0.76170213 0.77953394 0.76491895 0.76137285 0.77922999 0.7720618 ] mean value: 0.771355116514691 key: test_jcc value: [0.44 0.57894737 0.68421053 0.57142857 0.65217391 0.63157895 0.52631579 0.59090909 0.52380952 0.66666667] mean value: 0.5866040397436278 key: train_jcc value: [0.65340909 0.63186813 0.63333333 0.65517241 0.62569832 0.65168539 0.63736264 0.63186813 0.64971751 0.64444444] mean value: 0.641455941498394 MCC on Blind test: 0.35 Accuracy on Blind test: 0.68 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.09660125 0.07135081 0.09051609 0.06995201 0.07267976 0.07351184 0.08459163 0.07223678 0.07856154 0.14531898] mean value: 0.08553206920623779 key: score_time value: [0.01076746 0.01077509 0.01137328 0.01072907 0.01073861 0.01068258 0.01104283 0.01110196 0.01162791 0.01273251] mean value: 0.01115713119506836 key: test_mcc value: [0.438357 0.72374686 0.74166667 0.74689528 0.6125 0.80753845 0.74166667 0.48333333 0.42083333 0.74689528] mean value: 0.6463432884113032 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.71875 0.84375 0.87096774 0.87096774 0.80645161 0.90322581 0.87096774 0.74193548 0.70967742 0.87096774] mean value: 0.8207661290322581 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.70967742 0.81481481 0.875 0.88235294 0.8125 0.90909091 0.86666667 0.73333333 0.70967742 0.85714286] mean value: 0.8170256360934729 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.73333333 1. 0.875 0.83333333 0.8125 0.88235294 0.86666667 0.73333333 0.6875 0.92307692] mean value: 0.8347096530920061 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.6875 0.6875 0.875 0.9375 0.8125 0.9375 0.86666667 0.73333333 0.73333333 0.8 ] mean value: 0.8070833333333334 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.71875 0.84375 0.87083333 0.86875 0.80625 0.90208333 0.87083333 0.74166667 0.71041667 0.86875 ] mean value: 0.8202083333333333 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.55 0.6875 0.77777778 0.78947368 0.68421053 0.83333333 0.76470588 0.57894737 0.55 0.75 ] mean value: 0.6965948572411421 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.63 Accuracy on Blind test: 0.81 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.03259182 0.06050849 0.03706932 0.06424642 0.06668234 0.056499 0.03153658 0.03218102 0.05929947 0.05167437] mean value: 0.049228882789611815 key: score_time value: [0.02350378 0.01192856 0.023247 0.02247643 0.02323031 0.01227117 0.01193428 0.01194763 0.02183199 0.01188064] mean value: 0.017425179481506348 key: test_mcc value: [0.46056619 0.19738551 0.35445878 0.5612264 0.61608311 0.49612132 0.4184137 0.61925228 0.48527095 0.35416667] mean value: 0.4562944907398907 key: train_mcc value: [0.82144953 0.89306221 0.85054967 0.85764944 0.85798288 0.86478545 0.8718845 0.83736545 0.87197933 0.8718845 ] mean value: 0.8598592960239736 key: test_accuracy value: [0.71875 0.59375 0.67741935 0.77419355 0.80645161 0.74193548 0.70967742 0.80645161 0.74193548 0.67741935] mean value: 0.7247983870967741 key: train_accuracy value: [0.91071429 0.94642857 0.9252669 0.92882562 0.92882562 0.93238434 0.93594306 0.91814947 0.93594306 0.93594306] mean value: 0.9298423995932893 key: test_fscore value: [0.75675676 0.51851852 0.70588235 0.75862069 0.82352941 0.77777778 0.68965517 0.8125 0.71428571 0.66666667] mean value: 0.7224193060780282 key: train_fscore value: [0.91103203 0.94584838 0.92473118 0.92857143 0.92753623 0.93189964 0.93617021 0.91636364 0.93571429 0.93617021] mean value: 0.9294037236359098 key: test_precision value: [0.66666667 0.63636364 0.66666667 0.84615385 0.77777778 0.7 0.71428571 0.76470588 0.76923077 0.66666667] mean value: 0.7208517626164684 key: train_precision value: [0.90780142 0.95620438 0.92805755 0.92857143 0.94117647 0.9352518 0.93617021 0.94029851 0.94244604 0.93617021] mean value: 0.9352148025839478 key: test_recall value: [0.875 0.4375 0.75 0.6875 0.875 0.875 0.66666667 0.86666667 0.66666667 0.66666667] mean value: 0.7366666666666667 key: train_recall value: [0.91428571 0.93571429 0.92142857 0.92857143 0.91428571 0.92857143 0.93617021 0.89361702 0.92907801 0.93617021] mean value: 0.9237892603850051 key: test_roc_auc value: [0.71875 0.59375 0.675 0.77708333 0.80416667 0.7375 0.70833333 0.80833333 0.73958333 0.67708333] mean value: 0.7239583333333334 key: train_roc_auc value: [0.91071429 0.94642857 0.92525329 0.92882472 0.92877406 0.93237082 0.93594225 0.91823708 0.93596758 0.93594225] mean value: 0.9298454913880446 key: test_jcc value: [0.60869565 0.35 0.54545455 0.61111111 0.7 0.63636364 0.52631579 0.68421053 0.55555556 0.5 ] mean value: 0.5717706816448235 key: train_jcc value: [0.83660131 0.89726027 0.86 0.86666667 0.86486486 0.87248322 0.88 0.84563758 0.87919463 0.88 ] mean value: 0.8682708548935287 MCC on Blind test: 0.36 Accuracy on Blind test: 0.68 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01248503 0.0127666 0.00925589 0.00910306 0.00888062 0.00891542 0.00893378 0.00901175 0.00891209 0.00891852] mean value: 0.00971827507019043 key: score_time value: [0.01142454 0.0092175 0.00878191 0.00855398 0.00835586 0.00841069 0.00845838 0.00846076 0.0083797 0.00847077] mean value: 0.008851408958435059 key: test_mcc value: [0.38729833 0.12598816 0.35445878 0.35416667 0.57461167 0.28870546 0.6310315 0.63696156 0.61925228 0.4184137 ] mean value: 0.43908881135182387 key: train_mcc value: [0.46524806 0.47316995 0.46906706 0.46229203 0.44610424 0.50268922 0.46801866 0.46667052 0.44582343 0.48864808] mean value: 0.46877312600272963 key: test_accuracy value: [0.6875 0.5625 0.67741935 0.67741935 0.77419355 0.64516129 0.80645161 0.80645161 0.80645161 0.70967742] mean value: 0.7153225806451613 key: train_accuracy value: [0.73214286 0.73571429 0.73309609 0.72953737 0.72241993 0.75088968 0.73309609 0.73309609 0.72241993 0.74377224] mean value: 0.7336184544992375 key: test_fscore value: [0.72222222 0.53333333 0.70588235 0.6875 0.81081081 0.66666667 0.76923077 0.82352941 0.8125 0.68965517] mean value: 0.7221330739383478 key: train_fscore value: [0.74048443 0.74657534 0.74576271 0.74324324 0.73103448 0.75694444 0.74576271 0.74048443 0.73287671 0.75342466] mean value: 0.7436593164635377 key: test_precision value: [0.65 0.57142857 0.66666667 0.6875 0.71428571 0.64705882 0.90909091 0.73684211 0.76470588 0.71428571] mean value: 0.7061864386903086 key: train_precision value: [0.71812081 0.71710526 0.70967742 0.70512821 0.70666667 0.73648649 0.71428571 0.72297297 0.70860927 0.72847682] mean value: 0.7167529626137138 key: test_recall value: [0.8125 0.5 0.75 0.6875 0.9375 0.6875 0.66666667 0.93333333 0.86666667 0.66666667] mean value: 0.7508333333333334 key: train_recall value: [0.76428571 0.77857143 0.78571429 0.78571429 0.75714286 0.77857143 0.78014184 0.75886525 0.75886525 0.78014184] mean value: 0.7728014184397163 key: test_roc_auc value: [0.6875 0.5625 0.675 0.67708333 0.76875 0.64375 0.80208333 0.81041667 0.80833333 0.70833333] mean value: 0.714375 key: train_roc_auc value: [0.73214286 0.73571429 0.73328267 0.72973658 0.72254306 0.75098784 0.73292806 0.73300405 0.72228977 0.74364235] mean value: 0.7336271529888551 key: test_jcc value: [0.56521739 0.36363636 0.54545455 0.52380952 0.68181818 0.5 0.625 0.7 0.68421053 0.52631579] mean value: 0.5715462321812436 key: train_jcc value: [0.58791209 0.59562842 0.59459459 0.59139785 0.57608696 0.60893855 0.59459459 0.58791209 0.57837838 0.6043956 ] mean value: 0.5919839116558032 MCC on Blind test: 0.31 Accuracy on Blind test: 0.66 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01262021 0.01605392 0.01505256 0.01697302 0.01637769 0.01547146 0.01523614 0.0172348 0.01650333 0.01944351] mean value: 0.016096663475036622 key: score_time value: [0.00845838 0.01096606 0.01121974 0.0114274 0.01142192 0.0114193 0.01137662 0.01137543 0.01142097 0.01138806] mean value: 0.01104738712310791 key: test_mcc value: [0.31814238 0.37796447 0.6125 0.55777335 0.372678 0.4770843 0.48527095 0.61608311 0.57461167 0.25389818] mean value: 0.4646006411623224 key: train_mcc value: [0.73914049 0.78579447 0.7083207 0.6231496 0.39685714 0.60003451 0.66383564 0.71092171 0.75796241 0.70017814] mean value: 0.6686194811960497 key: test_accuracy value: [0.65625 0.6875 0.80645161 0.74193548 0.61290323 0.70967742 0.74193548 0.80645161 0.77419355 0.61290323] mean value: 0.7150201612903225 key: train_accuracy value: [0.85714286 0.89285714 0.85409253 0.77935943 0.63701068 0.76512456 0.82562278 0.83985765 0.87544484 0.82918149] mean value: 0.8155693950177936 key: test_fscore value: [0.62068966 0.66666667 0.8125 0.8 0.4 0.64 0.71428571 0.78571429 0.72 0.66666667] mean value: 0.6826522988505747 key: train_fscore value: [0.83606557 0.89361702 0.85198556 0.81871345 0.42696629 0.69158879 0.80784314 0.81327801 0.86692015 0.85454545] mean value: 0.7861523434278199 key: test_precision value: [0.69230769 0.71428571 0.8125 0.66666667 1. 0.88888889 0.76923077 0.84615385 0.9 0.57142857] mean value: 0.7861462148962148 key: train_precision value: [0.98076923 0.88732394 0.86131387 0.69306931 1. 1. 0.90350877 0.98 0.93442623 0.74603175] mean value: 0.8986443097444802 key: test_recall value: [0.5625 0.625 0.8125 1. 0.25 0.5 0.66666667 0.73333333 0.6 0.8 ] mean value: 0.655 key: train_recall value: [0.72857143 0.9 0.84285714 1. 0.27142857 0.52857143 0.73049645 0.69503546 0.80851064 1. ] mean value: 0.7505471124620061 key: test_roc_auc value: [0.65625 0.6875 0.80625 0.73333333 0.625 0.71666667 0.73958333 0.80416667 0.76875 0.61875 ] mean value: 0.715625 key: train_roc_auc value: [0.85714286 0.89285714 0.85405268 0.78014184 0.63571429 0.76428571 0.82596251 0.84037487 0.87568389 0.82857143] mean value: 0.8154787234042553 key: test_jcc value: [0.45 0.5 0.68421053 0.66666667 0.25 0.47058824 0.55555556 0.64705882 0.5625 0.5 ] mean value: 0.5286579807361541 key: train_jcc value: [0.71830986 0.80769231 0.74213836 0.69306931 0.27142857 0.52857143 0.67763158 0.68531469 0.76510067 0.74603175] mean value: 0.6635288519992544 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01545095 0.01672411 0.01582861 0.01625156 0.01635623 0.0165143 0.01512074 0.01825953 0.01661515 0.01824665] mean value: 0.016536784172058106 key: score_time value: [0.01169014 0.01137948 0.01162958 0.01137543 0.01138616 0.01170897 0.01159811 0.0119915 0.01221395 0.01147962] mean value: 0.011645293235778809 key: test_mcc value: [0.37796447 0.32897585 0.37191715 0.66057826 0.58316015 0.49612132 0.48333333 0.6681531 0.42083333 0.42321607] mean value: 0.4814253040803977 key: train_mcc value: [0.63245553 0.60648725 0.72898583 0.69608742 0.70724431 0.71414649 0.72646619 0.60723774 0.69071737 0.78942195] mean value: 0.6899250078740438 key: test_accuracy value: [0.65625 0.65625 0.67741935 0.80645161 0.77419355 0.74193548 0.74193548 0.80645161 0.70967742 0.70967742] mean value: 0.7280241935483871 key: train_accuracy value: [0.78571429 0.775 0.86120996 0.83629893 0.83985765 0.84697509 0.86120996 0.77580071 0.82562278 0.89323843] mean value: 0.8300927808845958 key: test_fscore value: [0.52173913 0.7027027 0.64285714 0.84210526 0.74074074 0.77777778 0.73333333 0.83333333 0.70967742 0.66666667] mean value: 0.7170933510359213 key: train_fscore value: [0.72727273 0.81415929 0.85057471 0.85443038 0.81327801 0.86261981 0.86868687 0.81524927 0.85106383 0.88888889] mean value: 0.8346223782529265 key: test_precision value: [0.85714286 0.61904762 0.75 0.72727273 0.90909091 0.7 0.73333333 0.71428571 0.6875 0.75 ] mean value: 0.744767316017316 key: train_precision value: [1. 0.69346734 0.91735537 0.76704545 0.97029703 0.78034682 0.82692308 0.695 0.74468085 0.93023256] mean value: 0.8325348499768358 key: test_recall value: [0.375 0.8125 0.5625 1. 0.625 0.875 0.73333333 1. 0.73333333 0.6 ] mean value: 0.7316666666666667 key: train_recall value: [0.57142857 0.98571429 0.79285714 0.96428571 0.7 0.96428571 0.91489362 0.9858156 0.9929078 0.85106383] mean value: 0.8723252279635259 key: test_roc_auc value: [0.65625 0.65625 0.68125 0.8 0.77916667 0.7375 0.74166667 0.8125 0.71041667 0.70625 ] mean value: 0.728125 key: train_roc_auc value: [0.78571429 0.775 0.86096758 0.83675279 0.8393617 0.84739108 0.86101824 0.77505066 0.82502533 0.89338906] mean value: 0.8299670719351571 key: test_jcc value: [0.35294118 0.54166667 0.47368421 0.72727273 0.58823529 0.63636364 0.57894737 0.71428571 0.55 0.5 ] mean value: 0.5663396794124348 key: train_jcc value: [0.57142857 0.68656716 0.74 0.74585635 0.68531469 0.75842697 0.76785714 0.68811881 0.74074074 0.8 ] mean value: 0.7184310436284728 MCC on Blind test: 0.32 Accuracy on Blind test: 0.66 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.14749765 0.12956071 0.130409 0.13197517 0.13010478 0.13024807 0.12997556 0.1305058 0.13536215 0.13055086] mean value: 0.13261897563934327 key: score_time value: [0.01478624 0.01482654 0.01473641 0.01463675 0.0148015 0.01491022 0.01483297 0.01472735 0.01480532 0.01476169] mean value: 0.01478250026702881 key: test_mcc value: [0.56360186 0.48653363 0.74689528 0.6778302 0.63696156 0.6778302 0.48333333 0.35445878 0.29166667 0.48527095] mean value: 0.5404382465644691 key: train_mcc value: [1. 0.99288247 1. 0.98586555 0.96443769 0.98586412 0.98586412 0.99290744 1. 1. ] mean value: 0.9907821400761553 key: test_accuracy value: [0.78125 0.71875 0.87096774 0.83870968 0.80645161 0.83870968 0.74193548 0.67741935 0.64516129 0.74193548] mean value: 0.7661290322580645 key: train_accuracy value: [1. 0.99642857 1. 0.99288256 0.98220641 0.99288256 0.99288256 0.99644128 1. 1. ] mean value: 0.9953723945094052 key: test_fscore value: [0.77419355 0.64 0.88235294 0.84848485 0.78571429 0.84848485 0.73333333 0.64285714 0.64516129 0.71428571] mean value: 0.7514867953046321 key: train_fscore value: [1. 0.99644128 1. 0.9929078 0.98220641 0.99280576 0.99295775 0.99646643 1. 1. ] mean value: 0.9953785421221143 key: test_precision value: [0.8 0.88888889 0.83333333 0.82352941 0.91666667 0.82352941 0.73333333 0.69230769 0.625 0.76923077] mean value: 0.7905819507290095 key: train_precision value: [1. 0.9929078 1. 0.98591549 0.9787234 1. 0.98601399 0.99295775 1. 1. ] mean value: 0.9936518431124365 key: test_recall value: [0.75 0.5 0.9375 0.875 0.6875 0.875 0.73333333 0.6 0.66666667 0.66666667] mean value: 0.7291666666666666 key: train_recall value: [1. 1. 1. 1. 0.98571429 0.98571429 1. 1. 1. 1. ] mean value: 0.9971428571428571 key: test_roc_auc value: [0.78125 0.71875 0.86875 0.8375 0.81041667 0.8375 0.74166667 0.675 0.64583333 0.73958333] mean value: 0.765625 key: train_roc_auc value: [1. 0.99642857 1. 0.9929078 0.98221884 0.99285714 0.99285714 0.99642857 1. 1. ] mean value: 0.9953698074974671 key: test_jcc value: [0.63157895 0.47058824 0.78947368 0.73684211 0.64705882 0.73684211 0.57894737 0.47368421 0.47619048 0.55555556] mean value: 0.6096761511622193 key: train_jcc value: [1. 0.9929078 1. 0.98591549 0.96503497 0.98571429 0.98601399 0.99295775 1. 1. ] mean value: 0.9908544277618296 MCC on Blind test: 0.41 Accuracy on Blind test: 0.71 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.05952263 0.08039355 0.07478786 0.0584228 0.06083059 0.06444645 0.05043173 0.07103467 0.06729746 0.07202792] mean value: 0.06591956615447998 key: score_time value: [0.01850533 0.03880453 0.02360177 0.02254486 0.02347612 0.02948332 0.01844764 0.03052831 0.02998972 0.02165484] mean value: 0.02570364475250244 key: test_mcc value: [0.438357 0.75592895 0.43041423 0.80833333 0.6778302 0.61608311 0.61608311 0.48333333 0.54812195 0.66057826] mean value: 0.6035063490310029 key: train_mcc value: [0.95802308 0.97879618 0.9439125 0.9716269 0.95039023 0.93652752 0.97152989 0.95078573 0.97192667 0.95768728] mean value: 0.9591205983699052 key: test_accuracy value: [0.71875 0.875 0.70967742 0.90322581 0.83870968 0.80645161 0.80645161 0.74193548 0.77419355 0.80645161] mean value: 0.7980846774193548 key: train_accuracy value: [0.97857143 0.98928571 0.97153025 0.98576512 0.97508897 0.96797153 0.98576512 0.97508897 0.98576512 0.97864769] mean value: 0.9793479918657855 key: test_fscore value: [0.70967742 0.86666667 0.68965517 0.90322581 0.84848485 0.82352941 0.78571429 0.73333333 0.75862069 0.75 ] mean value: 0.7868907633839257 key: train_fscore value: [0.97810219 0.98916968 0.97080292 0.98561151 0.97472924 0.96727273 0.9858156 0.97472924 0.98561151 0.97841727] mean value: 0.9790261886213206 key: test_precision value: [0.73333333 0.92857143 0.76923077 0.93333333 0.82352941 0.77777778 0.84615385 0.73333333 0.78571429 1. ] mean value: 0.8330977519212813 key: train_precision value: [1. 1. 0.99253731 0.99275362 0.98540146 0.98518519 0.9858156 0.99264706 1. 0.99270073] mean value: 0.9927040973247857 key: test_recall value: [0.6875 0.8125 0.625 0.875 0.875 0.875 0.73333333 0.73333333 0.73333333 0.6 ] mean value: 0.755 key: train_recall value: [0.95714286 0.97857143 0.95 0.97857143 0.96428571 0.95 0.9858156 0.95744681 0.97163121 0.96453901] mean value: 0.9658004052684903 key: test_roc_auc value: [0.71875 0.875 0.7125 0.90416667 0.8375 0.80416667 0.80416667 0.74166667 0.77291667 0.8 ] mean value: 0.7970833333333334 key: train_roc_auc value: [0.97857143 0.98928571 0.9714539 0.98573961 0.97505066 0.9679078 0.98576494 0.97515198 0.9858156 0.97869807] mean value: 0.9793439716312057 key: test_jcc value: [0.55 0.76470588 0.52631579 0.82352941 0.73684211 0.7 0.64705882 0.57894737 0.61111111 0.6 ] mean value: 0.6538510491916064 key: train_jcc value: [0.95714286 0.97857143 0.94326241 0.97163121 0.95070423 0.93661972 0.97202797 0.95070423 0.97163121 0.95774648] mean value: 0.9590041728324616 MCC on Blind test: 0.56 Accuracy on Blind test: 0.78 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.06940794 0.07174492 0.07724261 0.07404685 0.07390547 0.07745314 0.07874274 0.04722691 0.05125523 0.1250515 ] mean value: 0.0746077299118042 key: score_time value: [0.0226953 0.02435803 0.02252817 0.02285171 0.03218961 0.02162409 0.03372359 0.01344872 0.01349521 0.02339959] mean value: 0.023031401634216308 key: test_mcc value: [0.19088543 0.44539933 0.55 0.61925228 0.42083333 0.35445878 0.35983579 0.69203857 0.61925228 0.54812195] mean value: 0.4800077745753996 key: train_mcc value: [0.98571429 0.98571429 0.98586412 0.98576494 0.98576494 0.98576494 0.98576494 0.98576494 0.98576494 0.98576494] mean value: 0.9857647305833265 key: test_accuracy value: [0.59375 0.71875 0.77419355 0.80645161 0.70967742 0.67741935 0.67741935 0.83870968 0.80645161 0.77419355] mean value: 0.7377016129032258 key: train_accuracy value: [0.99285714 0.99285714 0.99288256 0.99288256 0.99288256 0.99288256 0.99288256 0.99288256 0.99288256 0.99288256] mean value: 0.9928774783934926 key: test_fscore value: [0.62857143 0.68965517 0.77419355 0.8 0.70967742 0.70588235 0.6875 0.84848485 0.8125 0.75862069] mean value: 0.7415085459808355 key: train_fscore value: [0.99285714 0.99285714 0.99280576 0.99285714 0.99285714 0.99285714 0.9929078 0.9929078 0.9929078 0.9929078 ] mean value: 0.9928722675355157 key: test_precision value: [0.57894737 0.76923077 0.8 0.85714286 0.73333333 0.66666667 0.64705882 0.77777778 0.76470588 0.78571429] mean value: 0.7380577764169095 key: train_precision value: [0.99285714 0.99285714 1. 0.99285714 0.99285714 0.99285714 0.9929078 0.9929078 0.9929078 0.9929078 ] mean value: 0.9935916919959473 key: test_recall value: [0.6875 0.625 0.75 0.75 0.6875 0.75 0.73333333 0.93333333 0.86666667 0.73333333] mean value: 0.7516666666666667 key: train_recall value: [0.99285714 0.99285714 0.98571429 0.99285714 0.99285714 0.99285714 0.9929078 0.9929078 0.9929078 0.9929078 ] mean value: 0.9921631205673759 key: test_roc_auc value: [0.59375 0.71875 0.775 0.80833333 0.71041667 0.675 0.67916667 0.84166667 0.80833333 0.77291667] mean value: 0.7383333333333334 key: train_roc_auc value: [0.99285714 0.99285714 0.99285714 0.99288247 0.99288247 0.99288247 0.99288247 0.99288247 0.99288247 0.99288247] mean value: 0.9928748733535968 key: test_jcc value: [0.45833333 0.52631579 0.63157895 0.66666667 0.55 0.54545455 0.52380952 0.73684211 0.68421053 0.61111111] mean value: 0.5934322548796233 key: train_jcc value: [0.9858156 0.9858156 0.98571429 0.9858156 0.9858156 0.9858156 0.98591549 0.98591549 0.98591549 0.98591549] mean value: 0.9858454271729669 MCC on Blind test: 0.27 Accuracy on Blind test: 0.64 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.50104094 0.4917357 0.48631215 0.49081039 0.50207019 0.4933908 0.49305654 0.48927522 0.47748899 0.47835588] mean value: 0.4903536796569824 key: score_time value: [0.0100379 0.0117836 0.00951767 0.01037145 0.01037979 0.00925756 0.00933361 0.0092392 0.00934649 0.00907397] mean value: 0.00983412265777588 key: test_mcc value: [0.625 0.82717019 0.55 0.80753845 0.61925228 0.6778302 0.61608311 0.48333333 0.42083333 0.61608311] mean value: 0.6243124021507616 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8125 0.90625 0.77419355 0.90322581 0.80645161 0.83870968 0.80645161 0.74193548 0.70967742 0.80645161] mean value: 0.8105846774193548 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8125 0.89655172 0.77419355 0.90909091 0.8 0.84848485 0.78571429 0.73333333 0.70967742 0.78571429] mean value: 0.8055260354217528 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8125 1. 0.8 0.88235294 0.85714286 0.82352941 0.84615385 0.73333333 0.6875 0.84615385] mean value: 0.828866623572506 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.8125 0.8125 0.75 0.9375 0.75 0.875 0.73333333 0.73333333 0.73333333 0.73333333] mean value: 0.7870833333333334 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8125 0.90625 0.775 0.90208333 0.80833333 0.8375 0.80416667 0.74166667 0.71041667 0.80416667] mean value: 0.8102083333333333 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.68421053 0.8125 0.63157895 0.83333333 0.66666667 0.73684211 0.64705882 0.57894737 0.55 0.64705882] mean value: 0.6788196594427245 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.58 Accuracy on Blind test: 0.79 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.02253461 0.02282524 0.02330494 0.03614402 0.02317977 0.02384901 0.02774358 0.02345324 0.02367496 0.02446938] mean value: 0.025117874145507812 key: score_time value: [0.01177716 0.01243234 0.01277757 0.01222444 0.01561308 0.01716805 0.01649523 0.01491165 0.01481533 0.01486301] mean value: 0.014307785034179687 key: test_mcc value: [0.44539933 0.37796447 0.48527095 0.48333333 0.48333333 0.50596443 0.55 0.71807033 0.23012754 0.25389818] mean value: 0.453336189415535 key: train_mcc value: [1. 0.95060645 1. 0.98586555 0.94460323 1. 0.97192106 0.96501929 0.89833485 0.95816272] mean value: 0.9674513135125098 key: test_accuracy value: [0.71875 0.6875 0.74193548 0.74193548 0.74193548 0.70967742 0.77419355 0.83870968 0.61290323 0.61290323] mean value: 0.7180443548387097 key: train_accuracy value: [1. 0.975 1. 0.99288256 0.97153025 1. 0.98576512 0.98220641 0.94661922 0.97864769] mean value: 0.9832651245551601 key: test_fscore value: [0.74285714 0.70588235 0.76470588 0.75 0.75 0.7804878 0.77419355 0.85714286 0.625 0.66666667] mean value: 0.741693625522593 key: train_fscore value: [1. 0.9754386 1. 0.9929078 0.97222222 1. 0.98601399 0.9825784 0.94949495 0.97916667] mean value: 0.9837822619520036 key: test_precision value: [0.68421053 0.66666667 0.72222222 0.75 0.75 0.64 0.75 0.75 0.58823529 0.57142857] mean value: 0.6872763280750896 key: train_precision value: [1. 0.95862069 1. 0.98591549 0.94594595 1. 0.97241379 0.96575342 0.90384615 0.95918367] mean value: 0.9691679173635389 key: test_recall value: [0.8125 0.75 0.8125 0.75 0.75 1. 0.8 1. 0.66666667 0.8 ] mean value: 0.8141666666666667 key: train_recall value: [1. 0.99285714 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9992857142857143 key: test_roc_auc value: [0.71875 0.6875 0.73958333 0.74166667 0.74166667 0.7 0.775 0.84375 0.61458333 0.61875 ] mean value: 0.718125 key: train_roc_auc value: [1. 0.975 1. 0.9929078 0.97163121 1. 0.98571429 0.98214286 0.94642857 0.97857143] mean value: 0.9832396149949342 key: test_jcc value: [0.59090909 0.54545455 0.61904762 0.6 0.6 0.64 0.63157895 0.75 0.45454545 0.5 ] mean value: 0.5931535657325131 key: train_jcc value: [1. 0.95205479 1. 0.98591549 0.94594595 1. 0.97241379 0.96575342 0.90384615 0.95918367] mean value: 0.9685113278500764 MCC on Blind test: 0.14 Accuracy on Blind test: 0.59 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02304816 0.05340958 0.04013824 0.02969098 0.01440239 0.01423526 0.01434231 0.02990079 0.03520942 0.03498721] mean value: 0.028936433792114257 key: score_time value: [0.01924634 0.01247501 0.034832 0.01187229 0.01180601 0.01168656 0.01202822 0.02046657 0.02182174 0.02312374] mean value: 0.017935848236083983 key: test_mcc value: [0.12909944 0.31814238 0.61608311 0.6778302 0.61608311 0.54812195 0.61608311 0.67916667 0.4184137 0.48954403] mean value: 0.5108567730282921 key: train_mcc value: [0.79287737 0.82890983 0.80799639 0.82226276 0.78654305 0.82991071 0.77955111 0.8442081 0.80105406 0.82208713] mean value: 0.8115400507144125 key: test_accuracy value: [0.5625 0.65625 0.80645161 0.83870968 0.80645161 0.77419355 0.80645161 0.83870968 0.70967742 0.74193548] mean value: 0.754133064516129 key: train_accuracy value: [0.89642857 0.91428571 0.90391459 0.91103203 0.89323843 0.91459075 0.88967972 0.92170819 0.90035587 0.91103203] mean value: 0.9056265887137773 key: test_fscore value: [0.61111111 0.62068966 0.82352941 0.84848485 0.82352941 0.78787879 0.78571429 0.83870968 0.68965517 0.75 ] mean value: 0.7579302361724007 key: train_fscore value: [0.89605735 0.91304348 0.90252708 0.91166078 0.89208633 0.91240876 0.88888889 0.92028986 0.89928058 0.91103203] mean value:/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:156: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:159: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) 0.9047275117158565 key: test_precision value: [0.55 0.69230769 0.77777778 0.82352941 0.77777778 0.76470588 0.84615385 0.8125 0.71428571 0.70588235] mean value: 0.7464920455361632 key: train_precision value: [0.89928058 0.92647059 0.91240876 0.9020979 0.89855072 0.93283582 0.89855072 0.94074074 0.91240876 0.91428571] mean value: 0.913763030931828 key: test_recall value: [0.6875 0.5625 0.875 0.875 0.875 0.8125 0.73333333 0.86666667 0.66666667 0.8 ] mean value: 0.7754166666666666 key: train_recall value: [0.89285714 0.9 0.89285714 0.92142857 0.88571429 0.89285714 0.87943262 0.90070922 0.88652482 0.90780142] mean value: 0.8960182370820668 key: test_roc_auc value: [0.5625 0.65625 0.80416667 0.8375 0.80416667 0.77291667 0.80416667 0.83958333 0.70833333 0.74375 ] mean value: 0.7533333333333333 key: train_roc_auc value: [0.89642857 0.91428571 0.90387538 0.9110689 0.89321175 0.91451368 0.88971631 0.92178318 0.90040527 0.91104357] mean value: 0.9056332320162107 key: test_jcc value: [0.44 0.45 0.7 0.73684211 0.7 0.65 0.64705882 0.72222222 0.52631579 0.6 ] mean value: 0.6172438940488476 key: train_jcc value: [0.81168831 0.84 0.82236842 0.83766234 0.80519481 0.83892617 0.8 0.85234899 0.81699346 0.83660131] mean value: 0.8261783814625151 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.27001762 0.23835921 0.23745036 0.23633218 0.24392223 0.31006145 0.33711934 0.26849365 0.2593081 0.27248955] mean value: 0.2673553705215454 key: score_time value: [0.02025056 0.02202392 0.02044964 0.02047157 0.01809716 0.01300192 0.03175521 0.0256021 0.02553105 0.02415633] mean value: 0.022133946418762207 key: test_mcc value: [0.12909944 0.37796447 0.61608311 0.74689528 0.48527095 0.54812195 0.6310315 0.63696156 0.42083333 0.42083333] mean value: 0.5013094945304436 key: train_mcc value: [0.79287737 0.67942124 0.80799639 0.65949742 0.68682877 0.82991071 0.65890803 0.65840807 0.68688251 0.72283925] mean value: 0.7183569755574486 key: test_accuracy value: [0.5625 0.6875 0.80645161 0.87096774 0.74193548 0.77419355 0.80645161 0.80645161 0.70967742 0.70967742] mean value: 0.7475806451612903 key: train_accuracy value: [0.89642857 0.83928571 0.90391459 0.82918149 0.84341637 0.91459075 0.82918149 0.82918149 0.84341637 0.86120996] mean value: 0.8589806812404677 key: test_fscore value: [0.61111111 0.66666667 0.82352941 0.88235294 0.76470588 0.78787879 0.76923077 0.82352941 0.70967742 0.70967742] mean value: 0.7548359820655836 key: train_fscore value: [0.89605735 0.84320557 0.90252708 0.83333333 0.84285714 0.91240876 0.83333333 0.83098592 0.84507042 0.8641115 ] mean value: 0.8603890403329323 key: test_precision value: [0.55 0.71428571 0.77777778 0.83333333 0.72222222 0.76470588 0.90909091 0.73684211 0.6875 0.6875 ] mean value: 0.7383257944326056 key: train_precision value: [0.89928058 0.82312925 0.91240876 0.81081081 0.84285714 0.93283582 0.81632653 0.82517483 0.83916084 0.84931507] mean value: 0.8551299624368872 key: test_recall value: [0.6875 0.625 0.875 0.9375 0.8125 0.8125 0.66666667 0.93333333 0.73333333 0.73333333] mean value: 0.7816666666666666 key: train_recall value: [0.89285714 0.86428571 0.89285714 0.85714286 0.84285714 0.89285714 0.85106383 0.83687943 0.85106383 0.87943262] mean value: 0.86612968591692 key: test_roc_auc value: [0.5625 0.6875 0.80416667 0.86875 0.73958333 0.77291667 0.80208333 0.81041667 0.71041667 0.71041667] mean value: 0.746875 key: train_roc_auc value: [0.89642857 0.83928571 0.90387538 0.82928065 0.84341439 0.91451368 0.82910334 0.829154 0.84338906 0.86114488] mean value: 0.8589589665653495 key: test_jcc value: [0.44 0.5 0.7 0.78947368 0.61904762 0.65 0.625 0.7 0.55 0.55 ] mean value: 0.6123521303258146 key: train_jcc value: [0.81168831 0.72891566 0.82236842 0.71428571 0.72839506 0.83892617 0.71428571 0.71084337 0.73170732 0.7607362 ] mean value: 0.7562151947074178 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03315091 0.03327417 0.03551388 0.02638865 0.03421259 0.02820277 0.05422091 0.03478932 0.04252434 0.03702164] mean value: 0.03592991828918457 key: score_time value: [0.01228666 0.01443219 0.01238513 0.01229572 0.01432252 0.01221967 0.01476145 0.01246548 0.01230025 0.01673579] mean value: 0.013420486450195312 key: test_mcc value: [0.5 0.3086067 0.46291005 0.46291005 0.84615385 0.54494926 0.20645591 0.28022427 0.12179487 0.44230769] mean value: 0.4176312649662118 key: train_mcc value: [0.72173913 0.67849178 0.68695652 0.67828651 0.69567848 0.66956522 0.66288973 0.74895052 0.72293853 0.71429643] mean value: 0.6979792842720367 key: test_accuracy value: [0.73076923 0.65384615 0.73076923 0.73076923 0.92307692 0.76923077 0.6 0.64 0.56 0.72 ] mean value: 0.7058461538461538 key: train_accuracy value: [0.86086957 0.83913043 0.84347826 0.83913043 0.84782609 0.83478261 0.83116883 0.87445887 0.86147186 0.85714286] mean value: 0.8489459815546773 key: test_fscore value: [0.77419355 0.64 0.72 0.74074074 0.92307692 0.78571429 0.61538462 0.57142857 0.56 0.72 ] mean value: 0.7050538684732233 key: train_fscore value: [0.86086957 0.83700441 0.84347826 0.83982684 0.84848485 0.83478261 0.83544304 0.87445887 0.86086957 0.8558952 ] mean value: 0.849111320253814 key: test_precision value: [0.66666667 0.66666667 0.75 0.71428571 0.92307692 0.73333333 0.57142857 0.66666667 0.58333333 0.75 ] mean value: 0.7025457875457876 key: train_precision value: [0.86086957 0.84821429 0.84347826 0.8362069 0.84482759 0.83478261 0.81818182 0.87826087 0.86086957 0.85964912] mean value: 0.848534057902696 key: test_recall value: [0.92307692 0.61538462 0.69230769 0.76923077 0.92307692 0.84615385 0.66666667 0.5 0.53846154 0.69230769] mean value: 0.7166666666666667 key: train_recall value: [0.86086957 0.82608696 0.84347826 0.84347826 0.85217391 0.83478261 0.85344828 0.87068966 0.86086957 0.85217391] mean value: 0.8498050974512743 key: test_roc_auc value: [0.73076923 0.65384615 0.73076923 0.73076923 0.92307692 0.76923077 0.6025641 0.63461538 0.56089744 0.72115385] mean value: 0.7057692307692307 key: train_roc_auc value: [0.86086957 0.83913043 0.84347826 0.83913043 0.84782609 0.83478261 0.83107196 0.87447526 0.86146927 0.85712144] mean value: 0.8489355322338831 key: test_jcc value: [0.63157895 0.47058824 0.5625 0.58823529 0.85714286 0.64705882 0.44444444 0.4 0.38888889 0.5625 ] mean value: 0.5552937490785788 key: train_jcc value: [0.75572519 0.71969697 0.72932331 0.7238806 0.73684211 0.71641791 0.7173913 0.77692308 0.75572519 0.7480916 ] mean value: 0.7380017256697218 MCC on Blind test: 0.38 Accuracy on Blind test: 0.69 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.91960263 0.77476716 0.93569636 0.74241829 0.72241664 0.87651324 0.72533584 0.79127908 0.79439926 0.76359153] mean value: 0.8046020030975342 key: score_time value: [0.01186204 0.01184845 0.01432681 0.01440716 0.01184154 0.01197195 0.01194453 0.0124135 0.01192331 0.01433039] mean value: 0.012686967849731445 key: test_mcc value: [0.26013299 0.3086067 0.46291005 0.46291005 0.84615385 0.5 0.20645591 0.28022427 0.19611614 0.44230769] mean value: 0.39658176429496 key: train_mcc value: [0.57445626 0.62611063 0.80027235 0.76524632 0.63487863 0.47942232 0.66288973 0.80157649 0.55957724 0.78383447] mean value: 0.6688264435025673 key: test_accuracy value: [0.61538462 0.65384615 0.73076923 0.73076923 0.92307692 0.73076923 0.6 0.64 0.6 0.72 ] mean value: 0.6944615384615385 key: train_accuracy value: [0.78695652 0.81304348 0.9 0.8826087 0.8173913 0.73913043 0.83116883 0.9004329 0.77922078 0.89177489] mean value: 0.8341727837380012 key: test_fscore value: [0.6875 0.64 0.72 0.74074074 0.92307692 0.77419355 0.61538462 0.57142857 0.64285714 0.72 ] mean value: 0.7035181541875091 key: train_fscore value: [0.79148936 0.81385281 0.90128755 0.88209607 0.81896552 0.74789916 0.83544304 0.90295359 0.78481013 0.89270386] mean value: 0.8371501089693046 key: test_precision value: [0.57894737 0.66666667 0.75 0.71428571 0.92307692 0.66666667 0.57142857 0.66666667 0.6 0.75 ] mean value: 0.6887738577212261 key: train_precision value: [0.775 0.81034483 0.88983051 0.88596491 0.81196581 0.72357724 0.81818182 0.88429752 0.76229508 0.88135593] mean value: 0.8242813649093232 key: test_recall value: [0.84615385 0.61538462 0.69230769 0.76923077 0.92307692 0.92307692 0.66666667 0.5 0.69230769 0.69230769] mean value: 0.7320512820512821 key: train_recall value: [0.80869565 0.8173913 0.91304348 0.87826087 0.82608696 0.77391304 0.85344828 0.92241379 0.80869565 0.90434783] mean value: 0.8506296851574213 key: test_roc_auc value: [0.61538462 0.65384615 0.73076923 0.73076923 0.92307692 0.73076923 0.6025641 0.63461538 0.59615385 0.72115385] mean value: 0.6939102564102564 key: train_roc_auc value: [0.78695652 0.81304348 0.9 0.8826087 0.8173913 0.73913043 0.83107196 0.90033733 0.77934783 0.89182909] mean value: 0.8341716641679161 key: test_jcc value: [0.52380952 0.47058824 0.5625 0.58823529 0.85714286 0.63157895 0.44444444 0.4 0.47368421 0.5625 ] mean value: 0.5514483512703326 key: train_jcc value: [0.65492958 0.68613139 0.8203125 0.7890625 0.69343066 0.59731544 0.7173913 0.82307692 0.64583333 0.80620155] mean value: 0.7233685168647699 MCC on Blind test: 0.43 Accuracy on Blind test: 0.72 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.02167606 0.00983524 0.01029277 0.01009822 0.00993466 0.00891805 0.00893712 0.0100255 0.01011395 0.01003551] mean value: 0.010986709594726562 key: score_time value: [0.01178932 0.0098989 0.0096755 0.00934529 0.00883579 0.00862885 0.0086813 0.0093863 0.00939846 0.00942039] mean value: 0.009506011009216308 key: test_mcc value: [0.43355498 0.23354968 0.43355498 0.23354968 0.63245553 0.56591646 0.35897436 0.27742513 0.1990977 0.5423696 ] mean value: 0.3910448113136374 key: train_mcc value: [0.46517657 0.48902898 0.45643546 0.46727535 0.46517657 0.45232785 0.48477646 0.51262311 0.4779845 0.42722854] mean value: 0.46980333989342027 key: test_accuracy value: [0.69230769 0.61538462 0.69230769 0.61538462 0.80769231 0.76923077 0.68 0.64 0.6 0.76 ] mean value: 0.6872307692307692 key: train_accuracy value: [0.72608696 0.73913043 0.7173913 0.72608696 0.72608696 0.72608696 0.73593074 0.74891775 0.73160173 0.68831169] mean value: 0.7265631469979296 key: test_fscore value: [0.75 0.64285714 0.75 0.64285714 0.82758621 0.8 0.66666667 0.60869565 0.66666667 0.8 ] mean value: 0.7155329478118084 key: train_fscore value: [0.75486381 0.76377953 0.75471698 0.75675676 0.75486381 0.72246696 0.76447876 0.77692308 0.75968992 0.74647887] mean value: 0.7555018489381352 key: test_precision value: [0.63157895 0.6 0.63157895 0.6 0.75 0.70588235 0.66666667 0.63636364 0.58823529 0.70588235] mean value: 0.6516188197767145 key: train_precision value: [0.68309859 0.69784173 0.66666667 0.68055556 0.68309859 0.73214286 0.69230769 0.70138889 0.68531469 0.62721893] mean value: 0.6849634190504885 key: test_recall value: [0.92307692 0.69230769 0.92307692 0.69230769 0.92307692 0.92307692 0.66666667 0.58333333 0.76923077 0.92307692] mean value: 0.801923076923077 key: train_recall value: [0.84347826 0.84347826 0.86956522 0.85217391 0.84347826 0.71304348 0.85344828 0.87068966 0.85217391 0.92173913] mean value: 0.8463268365817092 key: test_roc_auc value: [0.69230769 0.61538462 0.69230769 0.61538462 0.80769231 0.76923077 0.67948718 0.63782051 0.59294872 0.75320513] mean value: 0.6855769230769231 key: train_roc_auc value: [0.72608696 0.73913043 0.7173913 0.72608696 0.72608696 0.72608696 0.73541979 0.74838831 0.73212144 0.68931784] mean value: 0.7266116941529235 key: test_jcc value: [0.6 0.47368421 0.6 0.47368421 0.70588235 0.66666667 0.5 0.4375 0.5 0.66666667] mean value: 0.5624084107327141 key: train_jcc value: [0.60625 0.61783439 0.60606061 0.60869565 0.60625 0.56551724 0.61875 0.63522013 0.6125 0.59550562] mean value: 0.607258363828198 MCC on Blind test: 0.33 Accuracy on Blind test: 0.67 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00918698 0.00909734 0.00939155 0.00925922 0.00944185 0.0103898 0.00985146 0.01025653 0.01025677 0.00990176] mean value: 0.009703326225280761 key: score_time value: [0.00874496 0.00913382 0.00859189 0.00863647 0.00933886 0.00939512 0.00858855 0.00945616 0.00945187 0.00943851] mean value: 0.009077620506286622 key: test_mcc value: [0.24253563 0.23354968 0.40422604 0.6172134 0.9258201 0.60697698 0.22017621 0.19611614 0.19871795 0.28022427] mean value: 0.3925556392796845 key: train_mcc value: [0.49753679 0.56736651 0.51428939 0.50589946 0.46149812 0.47100984 0.53514724 0.52692012 0.53245877 0.48251499] mean value: 0.5094641245565984 key: test_accuracy value: [0.61538462 0.61538462 0.69230769 0.80769231 0.96153846 0.76923077 0.6 0.6 0.6 0.64 ] mean value: 0.6901538461538461 key: train_accuracy value: [0.74782609 0.7826087 0.75652174 0.75217391 0.73043478 0.73478261 0.76623377 0.76190476 0.76623377 0.74025974] mean value: 0.7538979860718991 key: test_fscore value: [0.66666667 0.58333333 0.73333333 0.81481481 0.96296296 0.8125 0.64285714 0.54545455 0.61538462 0.68965517] mean value: 0.7066962587221208 key: train_fscore value: [0.75833333 0.79166667 0.76470588 0.76150628 0.73728814 0.74476987 0.77868852 0.7755102 0.76521739 0.75 ] mean value: 0.7627686288549921 key: test_precision value: [0.58823529 0.63636364 0.64705882 0.78571429 0.92857143 0.68421053 0.5625 0.6 0.61538462 0.625 ] mean value: 0.6673038609996814 key: train_precision value: [0.728 0.76 0.7398374 0.73387097 0.71900826 0.71774194 0.7421875 0.73643411 0.76521739 0.72 ] mean value: 0.736229756589408 key: test_recall value: [0.76923077 0.53846154 0.84615385 0.84615385 1. 1. 0.75 0.5 0.61538462 0.76923077] mean value: 0.7634615384615384 key: train_recall value: [0.79130435 0.82608696 0.79130435 0.79130435 0.75652174 0.77391304 0.81896552 0.81896552 0.76521739 0.7826087 ] mean value: 0.7916191904047976 key: test_roc_auc value: [0.61538462 0.61538462 0.69230769 0.80769231 0.96153846 0.76923077 0.60576923 0.59615385 0.59935897 0.63461538] mean value: 0.6897435897435897 key: train_roc_auc value: [0.74782609 0.7826087 0.75652174 0.75217391 0.73043478 0.73478261 0.7660045 0.76165667 0.76622939 0.74044228] mean value: 0.7538680659670165 key: test_jcc value: [0.5 0.41176471 0.57894737 0.6875 0.92857143 0.68421053 0.47368421 0.375 0.44444444 0.52631579] mean value: 0.5610438473635068 key: train_jcc value: [0.61073826 0.65517241 0.61904762 0.61486486 0.58389262 0.59333333 0.63758389 0.63333333 0.61971831 0.6 ] mean value: 0.616768463933208 MCC on Blind test: 0.29 Accuracy on Blind test: 0.65 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00914311 0.00849319 0.00899696 0.00962973 0.00991821 0.00909781 0.00972629 0.00986457 0.00982285 0.00978184] mean value: 0.009447455406188965 key: score_time value: [0.01053095 0.00990725 0.01061821 0.0104363 0.01081729 0.0106883 0.01077843 0.0106926 0.01073742 0.01077318] mean value: 0.010597991943359374 key: test_mcc value: [-0.23354968 0.07692308 0.46291005 0.3086067 0.46291005 0.23076923 0.28205128 -0.2941742 0.11613145 0.35954625] mean value: 0.1772124200408744 key: train_mcc value: [0.53915082 0.54990908 0.51585963 0.56521739 0.44354534 0.49595227 0.51623761 0.55003766 0.48062074 0.53284841] mean value: 0.5189378957673179 key: test_accuracy value: [0.38461538 0.53846154 0.73076923 0.65384615 0.73076923 0.61538462 0.64 0.36 0.56 0.68 ] mean value: 0.5893846153846154 key: train_accuracy value: [0.76956522 0.77391304 0.75652174 0.7826087 0.72173913 0.74782609 0.75757576 0.77489177 0.74025974 0.76623377] mean value: 0.7591134952004517 key: test_fscore value: [0.42857143 0.53846154 0.72 0.64 0.74074074 0.61538462 0.64 0.27272727 0.59259259 0.71428571] mean value: 0.5902763902763902 key: train_fscore value: [0.77056277 0.78333333 0.76859504 0.7826087 0.71929825 0.75213675 0.76666667 0.77966102 0.74137931 0.76923077] mean value: 0.7633472601812795 key: test_precision value: [0.4 0.53846154 0.75 0.66666667 0.71428571 0.61538462 0.61538462 0.3 0.57142857 0.66666667] mean value: 0.5838278388278388 key: train_precision value: [0.76724138 0.752 0.73228346 0.7826087 0.72566372 0.7394958 0.74193548 0.76666667 0.73504274 0.75630252] mean value: 0.7499240461251708 key: test_recall value: [0.46153846 0.53846154 0.69230769 0.61538462 0.76923077 0.61538462 0.66666667 0.25 0.61538462 0.76923077] mean value: 0.5993589743589743 key: train_recall value: [0.77391304 0.8173913 0.80869565 0.7826087 0.71304348 0.76521739 0.79310345 0.79310345 0.74782609 0.7826087 ] mean value: 0.7777511244377812 key: test_roc_auc value: [0.38461538 0.53846154 0.73076923 0.65384615 0.73076923 0.61538462 0.64102564 0.35576923 0.55769231 0.67628205] mean value: 0.5884615384615385 key: train_roc_auc value: [0.76956522 0.77391304 0.75652174 0.7826087 0.72173913 0.74782609 0.75742129 0.77481259 0.74029235 0.76630435] mean value: 0.7591004497751125 key: test_jcc value: [0.27272727 0.36842105 0.5625 0.47058824 0.58823529 0.44444444 0.47058824 0.15789474 0.42105263 0.55555556] mean value: 0.43120074584857865 key: train_jcc value: [0.62676056 0.64383562 0.62416107 0.64285714 0.56164384 0.60273973 0.62162162 0.63888889 0.5890411 0.625 ] mean value: 0.6176549564546041 MCC on Blind test: 0.18 Accuracy on Blind test: 0.59 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01466942 0.01401472 0.01425433 0.01414585 0.01411271 0.0141561 0.01212788 0.01342058 0.01408696 0.01221085] mean value: 0.013719940185546875 key: score_time value: [0.01055813 0.0103178 0.01053596 0.01041269 0.01040769 0.00974035 0.01015902 0.00984025 0.00954103 0.00947142] mean value: 0.010098433494567871 key: test_mcc value: [0.26013299 0.3086067 0.46291005 0.6172134 0.9258201 0.47434165 0.13074409 0.11342411 0.11613145 0.43871881] mean value: 0.3848043345500712 key: train_mcc value: [0.67849178 0.64369733 0.65857921 0.66956522 0.63516695 0.63632416 0.66423848 0.73287422 0.68182751 0.6456866 ] mean value: 0.6646451453192566 key: test_accuracy value: [0.61538462 0.65384615 0.73076923 0.80769231 0.96153846 0.73076923 0.56 0.56 0.56 0.72 ] mean value: 0.6900000000000001 key: train_accuracy value: [0.83913043 0.82173913 0.82608696 0.83478261 0.8173913 0.8173913 0.83116883 0.86580087 0.83982684 0.82251082] mean value: 0.8315829098437795 key: test_fscore value: [0.6875 0.64 0.74074074 0.81481481 0.96296296 0.75862069 0.59259259 0.47619048 0.59259259 0.74074074] mean value: 0.7006755610290093 key: train_fscore value: [0.84120172 0.82403433 0.83739837 0.83478261 0.82051282 0.82352941 0.83817427 0.87029289 0.84518828 0.82553191] mean value: 0.8360646626759719 key: test_precision value: [0.57894737 0.66666667 0.71428571 0.78571429 0.92857143 0.6875 0.53333333 0.55555556 0.57142857 0.71428571] mean value: 0.6736288638262322 key: train_precision value: [0.83050847 0.81355932 0.78625954 0.83478261 0.80672269 0.79674797 0.808 0.84552846 0.81451613 0.80833333] mean value: 0.8144958521496004 key: test_recall value: [0.84615385 0.61538462 0.76923077 0.84615385 1. 0.84615385 0.66666667 0.41666667 0.61538462 0.76923077] mean value: 0.7391025641025641 key: train_recall value: [0.85217391 0.83478261 0.89565217 0.83478261 0.83478261 0.85217391 0.87068966 0.89655172 0.87826087 0.84347826] mean value: 0.8593328335832084 key: test_roc_auc value: [0.61538462 0.65384615 0.73076923 0.80769231 0.96153846 0.73076923 0.56410256 0.55448718 0.55769231 0.71794872] mean value: 0.6894230769230769 key: train_roc_auc value: [0.83913043 0.82173913 0.82608696 0.83478261 0.8173913 0.8173913 0.830997 0.86566717 0.8399925 0.8226012 ] mean value: 0.8315779610194903 key: test_jcc value: [0.52380952 0.47058824 0.58823529 0.6875 0.92857143 0.61111111 0.42105263 0.3125 0.42105263 0.58823529] mean value: 0.555265615017937 key: train_jcc value: [0.72592593 0.70072993 0.72027972 0.71641791 0.69565217 0.7 0.72142857 0.77037037 0.73188406 0.70289855] mean value: 0.7185587208068345 MCC on Blind test: 0.35 Accuracy on Blind test: 0.68 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.79133201 1.04224086 0.68274426 1.19078088 1.04643345 1.26272154 1.36647201 1.08281016 1.21882343 0.64180708] mean value: 1.0326165676116943 key: score_time value: [0.01203728 0.01447105 0.01207662 0.01443744 0.01444626 0.01231718 0.01682568 0.01575065 0.01480103 0.01208425] mean value: 0.013924741744995117 key: test_mcc value: [0.31622777 0.3086067 0.47434165 0.31622777 0.77151675 0.54494926 0.2941742 0.28022427 0.35897436 0.27742513] mean value: 0.3942667851121904 key: train_mcc value: [0.88725848 0.99134183 0.91428873 0.98275733 0.95684738 0.96521739 0.96550951 0.99137867 0.95703684 0.83716866] mean value: 0.9448804798337176 key: test_accuracy value: [0.65384615 0.65384615 0.73076923 0.65384615 0.88461538 0.76923077 0.64 0.64 0.68 0.64 ] mean value: 0.6946153846153846 key: train_accuracy value: [0.94347826 0.99565217 0.95652174 0.99130435 0.97826087 0.9826087 0.98268398 0.995671 0.97835498 0.91341991] mean value: 0.9717955957086392 key: test_fscore value: [0.68965517 0.64 0.69565217 0.68965517 0.88888889 0.78571429 0.66666667 0.57142857 0.69230769 0.66666667] mean value: 0.6986635290413401 key: train_fscore value: [0.94273128 0.99563319 0.95535714 0.99137931 0.97854077 0.9826087 0.98290598 0.99570815 0.97854077 0.91935484] mean value: 0.9722760135346585 key: test_precision value: [0.625 0.66666667 0.8 0.625 0.85714286 0.73333333 0.6 0.66666667 0.69230769 0.64285714] mean value: 0.6908974358974359 key: train_precision value: [0.95535714 1. 0.98165138 0.98290598 0.96610169 0.9826087 0.97457627 0.99145299 0.96610169 0.85714286] mean value: 0.9657898707174887 key: test_recall value: [0.76923077 0.61538462 0.61538462 0.76923077 0.92307692 0.84615385 0.75 0.5 0.69230769 0.69230769] mean value: 0.7173076923076923 key: train_recall value: [0.93043478 0.99130435 0.93043478 1. 0.99130435 0.9826087 0.99137931 1. 0.99130435 0.99130435] mean value: 0.9800074962518741 key: test_roc_auc value: [0.65384615 0.65384615 0.73076923 0.65384615 0.88461538 0.76923077 0.64423077 0.63461538 0.67948718 0.63782051] mean value: 0.6942307692307692 key: train_roc_auc value: [0.94347826 0.99565217 0.95652174 0.99130435 0.97826087 0.9826087 0.98264618 0.99565217 0.97841079 0.91375562] mean value: 0.9718290854572714 key: test_jcc value: [0.52631579 0.47058824 0.53333333 0.52631579 0.8 0.64705882 0.5 0.4 0.52941176 0.5 ] mean value: 0.5433023735810114 key: train_jcc value: [0.89166667 0.99130435 0.91452991 0.98290598 0.95798319 0.96581197 0.96638655 0.99145299 0.95798319 0.85074627] mean value: 0.9470771079026795 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02313399 0.01868463 0.01814771 0.01593852 0.01778412 0.01742077 0.01628041 0.01744723 0.01802492 0.01848912] mean value: 0.01813514232635498 key: score_time value: [0.01164889 0.00898504 0.00868201 0.00876403 0.00867915 0.0086062 0.00851274 0.00851393 0.00859714 0.00859332] mean value: 0.008958244323730468 key: test_mcc value: [0.77151675 0.70064905 0.31622777 0.6172134 0.69230769 0.53846154 0.11613145 0.36774959 0.6025641 0.60001249] mean value: 0.5322833824348638 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.88461538 0.84615385 0.65384615 0.80769231 0.84615385 0.76923077 0.56 0.68 0.8 0.8 ] mean value: 0.7647692307692308 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.88888889 0.85714286 0.60869565 0.81481481 0.84615385 0.76923077 0.52173913 0.69230769 0.8 0.81481481] mean value: 0.7613788465962379 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.85714286 0.8 0.7 0.78571429 0.84615385 0.76923077 0.54545455 0.64285714 0.83333333 0.78571429] mean value: 0.7565601065601065 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.92307692 0.92307692 0.53846154 0.84615385 0.84615385 0.76923077 0.5 0.75 0.76923077 0.84615385] mean value: 0.7711538461538462 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.88461538 0.84615385 0.65384615 0.80769231 0.84615385 0.76923077 0.55769231 0.68269231 0.80128205 0.79807692] mean value: 0.7647435897435897 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.8 0.75 0.4375 0.6875 0.73333333 0.625 0.35294118 0.52941176 0.66666667 0.6875 ] mean value: 0.626985294117647 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.33 Accuracy on Blind test: 0.67 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.09956813 0.10019636 0.0992651 0.09971094 0.09992576 0.09951663 0.0993073 0.09910083 0.10106516 0.09976125] mean value: 0.09974174499511719 key: score_time value: [0.01748419 0.01745272 0.01728463 0.01737189 0.01731944 0.01762104 0.01746798 0.01737499 0.01740623 0.01749277] mean value: 0.017427587509155275 key: test_mcc value: [0.23354968 0.23076923 0.23354968 0.15430335 0.85634884 0.53846154 0.35954625 0.02746175 0.20645591 0.19871795] mean value: 0.30391641821947946 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.61538462 0.61538462 0.61538462 0.57692308 0.92307692 0.76923077 0.68 0.52 0.6 0.6 ] mean value: 0.6515384615384615 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.64285714 0.61538462 0.58333333 0.59259259 0.92857143 0.76923077 0.63636364 0.4 0.58333333 0.61538462] mean value: 0.6367051467051468 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.6 0.61538462 0.63636364 0.57142857 0.86666667 0.76923077 0.7 0.5 0.63636364 0.61538462] mean value: 0.651082251082251 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.69230769 0.61538462 0.53846154 0.61538462 1. 0.76923077 0.58333333 0.33333333 0.53846154 0.61538462] mean value: 0.6301282051282051 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.61538462 0.61538462 0.61538462 0.57692308 0.92307692 0.76923077 0.67628205 0.51282051 0.6025641 0.59935897] mean value: 0.6506410256410257 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.47368421 0.44444444 0.41176471 0.42105263 0.86666667 0.625 0.46666667 0.25 0.41176471 0.44444444] mean value: 0.4815488476092191 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.32 Accuracy on Blind test: 0.66 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00924373 0.00909591 0.00902104 0.00901556 0.00903559 0.00899434 0.00899005 0.01007128 0.00922346 0.00893664] mean value: 0.00916275978088379 key: score_time value: [0.00859094 0.00852585 0.00848031 0.00861931 0.00847697 0.00852418 0.00880098 0.00921869 0.00855017 0.00851154] mean value: 0.008629894256591797 key: test_mcc value: [ 0.07784989 0.38924947 0.6172134 -0.31622777 0.15430335 0.3086067 0.11342411 0.28022427 0.19871795 0.27742513] mean value: 0.21007865056115643 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.53846154 0.69230769 0.80769231 0.34615385 0.57692308 0.65384615 0.56 0.64 0.6 0.64 ] mean value: 0.6055384615384616 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.57142857 0.66666667 0.81481481 0.26086957 0.59259259 0.64 0.47619048 0.57142857 0.61538462 0.66666667] mean value: 0.5876042540390367 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.53333333 0.72727273 0.78571429 0.3 0.57142857 0.66666667 0.55555556 0.66666667 0.61538462 0.64285714] mean value: 0.6064879564879565 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.61538462 0.61538462 0.84615385 0.23076923 0.61538462 0.61538462 0.41666667 0.5 0.61538462 0.69230769] mean value: 0.5762820512820513 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.53846154 0.69230769 0.80769231 0.34615385 0.57692308 0.65384615 0.55448718 0.63461538 0.59935897 0.63782051] mean value: 0.6041666666666666 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.4 0.5 0.6875 0.15 0.42105263 0.47058824 0.3125 0.4 0.44444444 0.5 ] mean value: 0.42860853113175096 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.16 Accuracy on Blind test: 0.58 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.36223483 1.38157392 1.35260463 1.36770082 1.45512009 1.4243691 1.37369204 1.38668728 1.35794759 1.35540032] mean value: 1.3817330598831177 key: score_time value: [0.09519005 0.09159613 0.09452748 0.0909903 0.09830999 0.09202361 0.09135485 0.09582472 0.09233117 0.09036994] mean value: 0.09325182437896729 key: test_mcc value: [0.47434165 0.53846154 0.40422604 0.38924947 0.84615385 0.53846154 0.51923077 0.1990977 0.35897436 0.44230769] mean value: 0.47105046071111284 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73076923 0.76923077 0.69230769 0.69230769 0.92307692 0.76923077 0.76 0.6 0.68 0.72 ] mean value: 0.7336923076923076 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.75862069 0.76923077 0.63636364 0.71428571 0.92307692 0.76923077 0.75 0.5 0.69230769 0.72 ] mean value: 0.7233116194150677 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.6875 0.76923077 0.77777778 0.66666667 0.92307692 0.76923077 0.75 0.625 0.69230769 0.75 ] mean value: 0.7410790598290599 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.84615385 0.76923077 0.53846154 0.76923077 0.92307692 0.76923077 0.75 0.41666667 0.69230769 0.69230769] mean value: 0.7166666666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.73076923 0.76923077 0.69230769 0.69230769 0.92307692 0.76923077 0.75961538 0.59294872 0.67948718 0.72115385] mean value: 0.7330128205128205 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.61111111 0.625 0.46666667 0.55555556 0.85714286 0.625 0.6 0.33333333 0.52941176 0.5625 ] mean value: 0.5765721288515406 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.49 Accuracy on Blind test: 0.75 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.87057018 0.89433622 0.93123794 0.86043167 0.92215872 0.94535112 0.91315413 0.94380188 0.93137598 0.8592701 ] mean value: 0.9071687936782837 key: score_time value: [0.26281023 0.11465859 0.26945305 0.24121785 0.2097342 0.23504758 0.22764254 0.22707939 0.18495941 0.23176789] mean value: 0.22043707370758056 key: test_mcc value: [0.31622777 0.53846154 0.56591646 0.6172134 0.9258201 0.69230769 0.6025641 0.1990977 0.12179487 0.6025641 ] mean value: 0.518196773243632 key: train_mcc value: [0.87839372 0.88725848 0.87038828 0.86099978 0.89619446 0.88699006 0.89662441 0.87890832 0.86146927 0.88748126] mean value: 0.880470802843822 key: test_accuracy value: [0.65384615 0.76923077 0.76923077 0.80769231 0.96153846 0.84615385 0.8 0.6 0.56 0.8 ] mean value: 0.7567692307692307 key: train_accuracy value: [0.93913043 0.94347826 0.93478261 0.93043478 0.94782609 0.94347826 0.94805195 0.93939394 0.93073593 0.94372294] mean value: 0.9401035196687371 key: test_fscore value: [0.68965517 0.76923077 0.72727273 0.81481481 0.96296296 0.84615385 0.8 0.5 0.56 0.8 ] mean value: 0.7470090292848914 key: train_fscore value: [0.93965517 0.94420601 0.93617021 0.92982456 0.94871795 0.94372294 0.94915254 0.94017094 0.93043478 0.94372294] mean value: 0.9405778056483304 key: test_precision value: [0.625 0.76923077 0.88888889 0.78571429 0.92857143 0.84615385 0.76923077 0.625 0.58333333 0.83333333] mean value: 0.7654456654456655 key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( train_precision value: [0.93162393 0.93220339 0.91666667 0.9380531 0.93277311 0.93965517 0.93333333 0.93220339 0.93043478 0.93965517] mean value: 0.9326602045310061 key: test_recall value: [0.76923077 0.76923077 0.61538462 0.84615385 1. 0.84615385 0.83333333 0.41666667 0.53846154 0.76923077] mean value: 0.7403846153846154 key: train_recall value: [0.94782609 0.95652174 0.95652174 0.92173913 0.96521739 0.94782609 0.96551724 0.94827586 0.93043478 0.94782609] mean value: 0.9487706146926537 key: test_roc_auc value: [0.65384615 0.76923077 0.76923077 0.80769231 0.96153846 0.84615385 0.80128205 0.59294872 0.56089744 0.80128205] mean value: 0.7564102564102564 key: train_roc_auc value: [0.93913043 0.94347826 0.93478261 0.93043478 0.94782609 0.94347826 0.94797601 0.93935532 0.93073463 0.94374063] mean value: 0.9400937031484258 key: test_jcc value: [0.52631579 0.625 0.57142857 0.6875 0.92857143 0.73333333 0.66666667 0.33333333 0.38888889 0.66666667] mean value: 0.6127704678362573 key: train_jcc value: [0.88617886 0.89430894 0.88 0.86885246 0.90243902 0.89344262 0.90322581 0.88709677 0.8699187 0.89344262] mean value: 0.8878905814018478 MCC on Blind test: 0.51 Accuracy on Blind test: 0.76 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00985193 0.00925279 0.0090847 0.00928497 0.00924087 0.00906801 0.00925469 0.00909114 0.00927258 0.00920963] mean value: 0.009261131286621094 key: score_time value: [0.00861406 0.00864077 0.00928879 0.00863123 0.00860333 0.0085845 0.00866389 0.00860381 0.00862241 0.00861168] mean value: 0.008686447143554687 key: test_mcc value: [0.24253563 0.23354968 0.40422604 0.6172134 0.9258201 0.60697698 0.22017621 0.19611614 0.19871795 0.28022427] mean value: 0.3925556392796845 key: train_mcc value: [0.49753679 0.56736651 0.51428939 0.50589946 0.46149812 0.47100984 0.53514724 0.52692012 0.53245877 0.48251499] mean value: 0.5094641245565984 key: test_accuracy value: [0.61538462 0.61538462 0.69230769 0.80769231 0.96153846 0.76923077 0.6 0.6 0.6 0.64 ] mean value: 0.6901538461538461 key: train_accuracy value: [0.74782609 0.7826087 0.75652174 0.75217391 0.73043478 0.73478261 0.76623377 0.76190476 0.76623377 0.74025974] mean value: 0.7538979860718991 key: test_fscore value: [0.66666667 0.58333333 0.73333333 0.81481481 0.96296296 0.8125 0.64285714 0.54545455 0.61538462 0.68965517] mean value: 0.7066962587221208 key: train_fscore value: [0.75833333 0.79166667 0.76470588 0.76150628 0.73728814 0.74476987 0.77868852 0.7755102 0.76521739 0.75 ] mean value: 0.7627686288549921 key: test_precision value: [0.58823529 0.63636364 0.64705882 0.78571429 0.92857143 0.68421053 0.5625 0.6 0.61538462 0.625 ] mean value: 0.6673038609996814 key: train_precision value: [0.728 0.76 0.7398374 0.73387097 0.71900826 0.71774194 0.7421875 0.73643411 0.76521739 0.72 ] mean value: 0.736229756589408 key: test_recall value: [0.76923077 0.53846154 0.84615385 0.84615385 1. 1. 0.75 0.5 0.61538462 0.76923077] mean value: 0.7634615384615384 key: train_recall value: [0.79130435 0.82608696 0.79130435 0.79130435 0.75652174 0.77391304 0.81896552 0.81896552 0.76521739 0.7826087 ] mean value: 0.7916191904047976 key: test_roc_auc value: [0.61538462 0.61538462 0.69230769 0.80769231 0.96153846 0.76923077 0.60576923 0.59615385 0.59935897 0.63461538] mean value: 0.6897435897435897 key: train_roc_auc value: [0.74782609 0.7826087 0.75652174 0.75217391 0.73043478 0.73478261 0.7660045 0.76165667 0.76622939 0.74044228] mean value: 0.7538680659670165 key: test_jcc value: [0.5 0.41176471 0.57894737 0.6875 0.92857143 0.68421053 0.47368421 0.375 0.44444444 0.52631579] mean value: 0.5610438473635068 key: train_jcc value: [0.61073826 0.65517241 0.61904762 0.61486486 0.58389262 0.59333333 0.63758389 0.63333333 0.61971831 0.6 ] mean value: 0.616768463933208 MCC on Blind test: 0.29 Accuracy on Blind test: 0.65 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.08827448 0.0708189 0.07702231 0.22159338 0.06050777 0.06993937 0.06500721 0.08993006 0.05968285 0.05984974] mean value: 0.08626260757446289 key: score_time value: [0.011204 0.01124358 0.0114789 0.01113033 0.01171565 0.01125646 0.01110744 0.01070189 0.01019716 0.01020026] mean value: 0.011023569107055663 key: test_mcc value: [0.77151675 0.69230769 0.5 0.69230769 0.77151675 0.6172134 0.37073365 0.43871881 0.52904327 0.52904327] mean value: 0.5912401278510183 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.88461538 0.84615385 0.73076923 0.84615385 0.88461538 0.80769231 0.68 0.72 0.76 0.76 ] mean value: 0.792 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.88888889 0.84615385 0.66666667 0.84615385 0.88888889 0.81481481 0.6 0.69565217 0.75 0.75 ] mean value: 0.7747219125479995 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.85714286 0.84615385 0.875 0.84615385 0.85714286 0.78571429 0.75 0.72727273 0.81818182 0.81818182] mean value: 0.8180944055944056 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.92307692 0.84615385 0.53846154 0.84615385 0.92307692 0.84615385 0.5 0.66666667 0.69230769 0.69230769] mean value: 0.7474358974358974 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.88461538 0.84615385 0.73076923 0.84615385 0.88461538 0.80769231 0.67307692 0.71794872 0.76282051 0.76282051] mean value: 0.7916666666666666 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.8 0.73333333 0.5 0.73333333 0.8 0.6875 0.42857143 0.53333333 0.6 0.6 ] mean value: 0.6416071428571428 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.54 Accuracy on Blind test: 0.77 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.0325067 0.03058124 0.03802133 0.05640101 0.05467892 0.02662373 0.02773499 0.04848194 0.04608655 0.05499625] mean value: 0.04161126613616943 key: score_time value: [0.01179028 0.01180315 0.02161908 0.0220561 0.01192045 0.01189947 0.01557565 0.01187563 0.04020929 0.02342272] mean value: 0.018217182159423827 key: test_mcc value: [ 0.6172134 0.15430335 0.54494926 0. 0.53846154 0.16666667 0.35897436 -0.13074409 0.35897436 0.35897436] mean value: 0.2967773202682685 key: train_mcc value: [0.84360585 0.88725848 0.82621191 0.85220613 0.85246403 0.89619446 0.87878561 0.95674339 0.84415292 0.85283755] mean value: 0.8690460318996313 key: test_accuracy value: [0.80769231 0.57692308 0.76923077 0.5 0.76923077 0.57692308 0.68 0.44 0.68 0.68 ] mean value: 0.648 key: train_accuracy value: [0.92173913 0.94347826 0.91304348 0.92608696 0.92608696 0.94782609 0.93939394 0.97835498 0.92207792 0.92640693] mean value: 0.9344494635798983 key: test_fscore value: [0.8 0.56 0.78571429 0.48 0.76923077 0.64516129 0.66666667 0.36363636 0.69230769 0.69230769] mean value: 0.645502476018605 key: train_fscore value: [0.92105263 0.94420601 0.9122807 0.92640693 0.92703863 0.94871795 0.93965517 0.97854077 0.92173913 0.92576419] mean value: 0.9345402111171843 key: test_precision value: [0.83333333 0.58333333 0.73333333 0.5 0.76923077 0.55555556 0.66666667 0.4 0.69230769 0.69230769] mean value: 0.6426068376068376 key: train_precision value: [0.92920354 0.93220339 0.92035398 0.92241379 0.91525424 0.93277311 0.93965517 0.97435897 0.92173913 0.92982456] mean value: 0.9317779890200742 key: test_recall value: [0.76923077 0.53846154 0.84615385 0.46153846 0.76923077 0.76923077 0.66666667 0.33333333 0.69230769 0.69230769] mean value: 0.6538461538461539 key: train_recall value: [0.91304348 0.95652174 0.90434783 0.93043478 0.93913043 0.96521739 0.93965517 0.98275862 0.92173913 0.92173913] mean value: 0.9374587706146926 key: test_roc_auc value: [0.80769231 0.57692308 0.76923077 0.5 0.76923077 0.57692308 0.67948718 0.43589744 0.67948718 0.67948718] mean value: 0.6474358974358975 key: train_roc_auc value: [0.92173913 0.94347826 0.91304348 0.92608696 0.92608696 0.94782609 0.9393928 0.97833583 0.92207646 0.92638681] mean value: 0.9344452773613193 key: test_jcc value: [0.66666667 0.38888889 0.64705882 0.31578947 0.625 0.47619048 0.5 0.22222222 0.52941176 0.52941176] mean value: 0.4900640080593641 key: train_jcc value: [0.85365854 0.89430894 0.83870968 0.86290323 0.864 0.90243902 0.88617886 0.95798319 0.85483871 0.86178862] mean value: 0.8776808789920374 MCC on Blind test: 0.41 Accuracy on Blind test: 0.71 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.0127399 0.00920939 0.00912523 0.00896168 0.00895476 0.00874591 0.00885916 0.00897789 0.00905085 0.00910902] mean value: 0.00937337875366211 key: score_time value: [0.01170278 0.00882316 0.00865865 0.00838375 0.0084579 0.00852489 0.00889111 0.00839472 0.0085063 0.00878024] mean value: 0.008912348747253418 key: test_mcc value: [0.33333333 0.38461538 0.54494926 0.3086067 0.70064905 0.56591646 0.44702443 0.37073365 0.35954625 0.43871881] mean value: 0.44540933216111894 key: train_mcc value: [0.46262193 0.46262193 0.4451645 0.45301392 0.45425676 0.44455524 0.45638654 0.47461305 0.46446918 0.48335689] mean value: 0.4601059925977649 key: test_accuracy value: [0.65384615 0.69230769 0.76923077 0.65384615 0.84615385 0.76923077 0.72 0.68 0.68 0.72 ] mean value: 0.7184615384615385 key: train_accuracy value: [0.73043478 0.73043478 0.72173913 0.72608696 0.72608696 0.72173913 0.72727273 0.73593074 0.73160173 0.74025974] mean value: 0.7291586674195369 key: test_fscore value: [0.70967742 0.69230769 0.78571429 0.64 0.85714286 0.8 0.66666667 0.6 0.71428571 0.74074074] mean value: 0.7206535376212796 key: train_fscore value: [0.74166667 0.74166667 0.73333333 0.73417722 0.73858921 0.73109244 0.74074074 0.75102041 0.7394958 0.75206612] mean value: 0.7403848593375401 key: test_precision value: [0.61111111 0.69230769 0.73333333 0.66666667 0.8 0.70588235 0.77777778 0.75 0.66666667 0.71428571] mean value: 0.7118031315090139 key: train_precision value: [0.712 0.712 0.704 0.71311475 0.70634921 0.70731707 0.70866142 0.71317829 0.71544715 0.71653543] mean value: 0.7108603333057187 key: test_recall value: [0.84615385 0.69230769 0.84615385 0.61538462 0.92307692 0.92307692 0.58333333 0.5 0.76923077 0.76923077] mean value: 0.7467948717948718 key: train_recall value: [0.77391304 0.77391304 0.76521739 0.75652174 0.77391304 0.75652174 0.77586207 0.79310345 0.76521739 0.79130435] mean value: 0.7725487256371814 key: test_roc_auc value: [0.65384615 0.69230769 0.76923077 0.65384615 0.84615385 0.76923077 0.71474359 0.67307692 0.67628205 0.71794872] mean value: 0.7166666666666667 key: train_roc_auc value: [0.73043478 0.73043478 0.72173913 0.72608696 0.72608696 0.72173913 0.72706147 0.73568216 0.73174663 0.74047976] mean value: 0.729149175412294 key: test_jcc value: [0.55 0.52941176 0.64705882 0.47058824 0.75 0.66666667 0.5 0.42857143 0.55555556 0.58823529] mean value: 0.568608776844071 key: train_jcc value: [0.58940397 0.58940397 0.57894737 0.58 0.58552632 0.57615894 0.58823529 0.60130719 0.58666667 0.60264901] mean value: 0.5878298728577058 MCC on Blind test: 0.33 Accuracy on Blind test: 0.67 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01029062 0.01638079 0.01510525 0.01574659 0.01706219 0.01559711 0.01537395 0.01762438 0.01753545 0.01649976] mean value: 0.015721607208251952 key: score_time value: [0.00846767 0.01144028 0.01150489 0.01210713 0.01146317 0.01151443 0.01178288 0.01176548 0.01183534 0.01185989] mean value: 0.011374115943908691 key: test_mcc value: [0.48795004 0.3086067 0.16666667 0.42640143 0.66666667 0.72760688 0.21245915 0.28022427 0.36774959 0.1990977 ] mean value: 0.38434290828396633 key: train_mcc value: [0.60151666 0.70534562 0.55386282 0.52704628 0.47265659 0.5203059 0.3545926 0.82730706 0.74633927 0.66731915] mean value: 0.5976291936314948 key: test_accuracy value: [0.69230769 0.65384615 0.57692308 0.65384615 0.80769231 0.84615385 0.56 0.64 0.68 0.6 ] mean value: 0.6710769230769231 key: train_accuracy value: [0.76956522 0.84782609 0.73913043 0.7173913 0.6826087 0.71304348 0.61038961 0.91341991 0.86580087 0.81385281] mean value: 0.7673028420854507 key: test_fscore value: [0.76470588 0.66666667 0.64516129 0.47058824 0.83870968 0.81818182 0.15384615 0.57142857 0.66666667 0.66666667] mean value: 0.6262621628845538 key: train_fscore value: [0.8113879 0.85943775 0.79166667 0.60606061 0.75907591 0.59756098 0.36619718 0.91525424 0.85024155 0.8401487 ] mean value: 0.7397031472452881 key: test_precision value: [0.61904762 0.64285714 0.55555556 1. 0.72222222 1. 1. 0.66666667 0.72727273 0.58823529] mean value: 0.7521857227739581 key: train_precision value: [0.68674699 0.79850746 0.65895954 1. 0.61170213 1. 1. 0.9 0.95652174 0.73376623] mean value: 0.8346204088766872 key: test_recall value: [1. 0.69230769 0.76923077 0.30769231 1. 0.69230769 0.08333333 0.5 0.61538462 0.76923077] mean value: 0.642948717948718 key: train_recall value: [0.99130435 0.93043478 0.99130435 0.43478261 1. 0.42608696 0.22413793 0.93103448 0.76521739 0.9826087 ] mean value: 0.7676911544227886 key: test_roc_auc value: [0.69230769 0.65384615 0.57692308 0.65384615 0.80769231 0.84615385 0.54166667 0.63461538 0.68269231 0.59294872] mean value: 0.6682692307692307 key: train_roc_auc value: [0.76956522 0.84782609 0.73913043 0.7173913 0.6826087 0.71304348 0.61206897 0.91334333 0.86536732 0.81458021] mean value: 0.7674925037481259 key: test_jcc value: [0.61904762 0.5 0.47619048 0.30769231 0.72222222 0.69230769 0.08333333 0.4 0.5 0.5 ] mean value: 0.4800793650793651 key: train_jcc value: [0.68263473 0.75352113 0.65517241 0.43478261 0.61170213 0.42608696 0.22413793 0.84375 0.7394958 0.72435897] mean value: 0.6095642667682339 MCC on Blind test: 0.35 Accuracy on Blind test: 0.68 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01598191 0.01755881 0.01522183 0.01561093 0.01599932 0.01559615 0.01611876 0.01620126 0.01491332 0.01539016] mean value: 0.015859246253967285 key: score_time value: [0.01147318 0.01146603 0.01138854 0.01152349 0.01148939 0.0114677 0.01142216 0.01143479 0.01141429 0.01151896] mean value: 0.011459851264953613 key: test_mcc value: [0.08084521 0.24253563 0.54494926 0.48795004 0.79056942 0.33333333 0.31581015 0.1141228 0.19871795 0.52904327] mean value: 0.36378770402293453 key: train_mcc value: [0.61879835 0.7659626 0.71524747 0.51355259 0.60715853 0.71269665 0.74152227 0.65132718 0.73818656 0.73373869] mean value: 0.6798190892220928 key: test_accuracy value: [0.53846154 0.61538462 0.76923077 0.69230769 0.88461538 0.65384615 0.64 0.56 0.6 0.76 ] mean value: 0.6713846153846154 key: train_accuracy value: [0.78695652 0.87826087 0.85217391 0.70869565 0.7826087 0.84782609 0.86580087 0.8008658 0.85714286 0.86580087] mean value: 0.8246132128740824 key: test_fscore value: [0.45454545 0.54545455 0.75 0.76470588 0.86956522 0.70967742 0.68965517 0.42105263 0.61538462 0.75 ] mean value: 0.657004093847644 key: train_fscore value: [0.73796791 0.86792453 0.864 0.77441077 0.73404255 0.8627451 0.87649402 0.75531915 0.87258687 0.85972851] mean value: 0.8205219420596624 key: test_precision value: [0.55555556 0.66666667 0.81818182 0.61904762 1. 0.61111111 0.58823529 0.57142857 0.61538462 0.81818182] mean value: 0.6863793069675422 key: train_precision value: [0.95833333 0.94845361 0.8 0.63186813 0.94520548 0.78571429 0.81481481 0.98611111 0.78472222 0.89622642] mean value: 0.8551449401857716 key: test_recall value: [0.38461538 0.46153846 0.69230769 1. 0.76923077 0.84615385 0.83333333 0.33333333 0.61538462 0.69230769] mean value: 0.6628205128205128 key: train_recall value: [0.6 0.8 0.93913043 1. 0.6 0.95652174 0.94827586 0.61206897 0.9826087 0.82608696] mean value: 0.8264692653673164 key: test_roc_auc value: [0.53846154 0.61538462 0.76923077 0.69230769 0.88461538 0.65384615 0.6474359 0.55128205 0.59935897 0.76282051] mean value: 0.671474358974359 key: train_roc_auc value: [0.78695652 0.87826087 0.85217391 0.70869565 0.7826087 0.84782609 0.86544228 0.80168666 0.85768366 0.86562969] mean value: 0.8246964017991005 key: test_jcc value: [0.29411765 0.375 0.6 0.61904762 0.76923077 0.55 0.52631579 0.26666667 0.44444444 0.6 ] mean value: 0.5044822935922008 key: train_jcc value: [0.58474576 0.76666667 0.76056338 0.63186813 0.57983193 0.75862069 0.78014184 0.60683761 0.7739726 0.75396825] mean value: 0.6997216871473853 MCC on Blind test: 0.42 Accuracy on Blind test: 0.71 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.13755894 0.11953592 0.11992717 0.12073612 0.12117648 0.11990905 0.12110257 0.12053275 0.11999488 0.11980319] mean value: 0.12202770709991455 key: score_time value: [0.01471901 0.01477623 0.01485658 0.01490092 0.01529408 0.01525807 0.01554537 0.01495647 0.0157783 0.01499677] mean value: 0.015108180046081544 key: test_mcc value: [0.54494926 0.23076923 0.56591646 0.47434165 0.77151675 0.6172134 0.44702443 0.28022427 0.36774959 0.67948718] mean value: 0.4979192215844021 key: train_mcc value: [1. 1. 0.99134183 1. 1. 1. 1. 0.99137867 0.99137931 1. ] mean value: 0.9974099805575383 key: test_accuracy value: [0.76923077 0.61538462 0.76923077 0.73076923 0.88461538 0.80769231 0.72 0.64 0.68 0.84 ] mean value: 0.7456923076923077 key: train_accuracy value: [1. 1. 0.99565217 1. 1. 1. 1. 0.995671 0.995671 1. ] mean value: 0.9986994165255035 key: test_fscore value: [0.78571429 0.61538462 0.72727273 0.75862069 0.88888889 0.81481481 0.66666667 0.57142857 0.66666667 0.84615385] mean value: 0.7341611772646256 key: train_fscore value: [1. 1. 0.99563319 1. 1. 1. 1. 0.99570815 0.995671 1. ] mean value: 0.998701233795036 key: test_precision value: [0.73333333 0.61538462 0.88888889 0.6875 0.85714286 0.78571429 0.77777778 0.66666667 0.72727273 0.84615385] mean value: 0.7585834998334998 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 0.99145299 0.99137931 1. ] mean value: 0.9982832301797819 key: test_recall value: [0.84615385 0.61538462 0.61538462 0.84615385 0.92307692 0.84615385 0.58333333 0.5 0.61538462 0.84615385] mean value: 0.7237179487179487 key: train_recall value: [1. 1. 0.99130435 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9991304347826087 key: test_roc_auc value: [0.76923077 0.61538462 0.76923077 0.73076923 0.88461538 0.80769231 0.71474359 0.63461538 0.68269231 0.83974359] mean value: 0.7448717948717949 key: train_roc_auc value: [1. 1. 0.99565217 1. 1. 1. 1. 0.99565217 0.99568966 1. ] mean value: 0.99869940029985 key: test_jcc value: [0.64705882 0.44444444 0.57142857 0.61111111 0.8 0.6875 0.5 0.4 0.5 0.73333333] mean value: 0.5894876283846873 key: train_jcc value: [1. 1. 0.99130435 1. 1. 1. 1. 0.99145299 0.99137931 1. ] mean value: 0.9974136649623906 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.05035281 0.05936909 0.05014372 0.0604763 0.056777 0.06902242 0.04624724 0.04836965 0.04742455 0.06249499] mean value: 0.05506777763366699 key: score_time value: [0.0195353 0.02203798 0.02992177 0.02294207 0.03088975 0.01927781 0.02395821 0.02438879 0.02904439 0.02309632] mean value: 0.024509239196777343 key: test_mcc value: [0.38924947 0.63245553 0.5 0.69230769 0.70064905 0.54494926 0.19611614 0.51923077 0.52904327 0.51923077] mean value: 0.5223231949125111 key: train_mcc value: [0.94839996 0.92261158 0.94911877 0.9742446 0.94001934 0.9658018 0.99137867 0.95674663 0.95674339 0.92237897] mean value: 0.95274437058133 key: test_accuracy value: [0.69230769 0.80769231 0.73076923 0.84615385 0.84615385 0.76923077 0.6 0.76 0.76 0.76 ] mean value: 0.7572307692307693 key: train_accuracy value: [0.97391304 0.96086957 0.97391304 0.98695652 0.96956522 0.9826087 0.995671 0.97835498 0.97835498 0.96103896] mean value: 0.9761246000376436 key: test_fscore value: [0.66666667 0.82758621 0.66666667 0.84615385 0.83333333 0.75 0.54545455 0.75 0.75 0.76923077] mean value: 0.740509203440238 key: train_fscore value: [0.97345133 0.96 0.97321429 0.98678414 0.96888889 0.98230088 0.99570815 0.97835498 0.97816594 0.96035242] mean value: 0.9757221022595252 key: test_precision value: [0.72727273 0.75 0.875 0.84615385 0.90909091 0.81818182 0.6 0.75 0.81818182 0.76923077] mean value: 0.7863111888111888 key: train_precision value: [0.99099099 0.98181818 1. 1. 0.99090909 1. 0.99145299 0.9826087 0.98245614 0.97321429] mean value: 0.9893450376888592 key: test_recall value: [0.61538462 0.92307692 0.53846154 0.84615385 0.76923077 0.69230769 0.5 0.75 0.69230769 0.76923077] mean value: 0.7096153846153846 key: train_recall value: [0.95652174 0.93913043 0.94782609 0.97391304 0.94782609 0.96521739 1. 0.97413793 0.97391304 0.94782609] mean value: 0.9626311844077962 key: test_roc_auc value: [0.69230769 0.80769231 0.73076923 0.84615385 0.84615385 0.76923077 0.59615385 0.75961538 0.76282051 0.75961538] mean value: 0.757051282051282 key: train_roc_auc value: [0.97391304 0.96086957 0.97391304 0.98695652 0.96956522 0.9826087 0.99565217 0.97837331 0.97833583 0.96098201] mean value: 0.9761169415292353 key: test_jcc value: [0.5 0.70588235 0.5 0.73333333 0.71428571 0.6 0.375 0.6 0.6 0.625 ] mean value: 0.5953501400560224 key: train_jcc value: [0.94827586 0.92307692 0.94782609 0.97391304 0.93965517 0.96521739 0.99145299 0.95762712 0.95726496 0.92372881] mean value: 0.9528038360220151 MCC on Blind test: 0.44 Accuracy on Blind test: 0.71 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.04538059 0.05333805 0.03113556 0.0357511 0.07322001 0.05139589 0.03071594 0.03451061 0.10031009 0.06317687] mean value: 0.05189347267150879 key: score_time value: [0.02160525 0.01283216 0.01277208 0.02457476 0.02156925 0.01298785 0.01267076 0.02030373 0.02255225 0.01272869] mean value: 0.017459678649902343 key: test_mcc value: [ 0. 0.46291005 0.38461538 0.38461538 0.46291005 0.3086067 0.36774959 -0.05337605 0.03846154 0.11613145] mean value: 0.2472624094332408 key: train_mcc value: [0.99134183 0.99134183 0.99134183 1. 0.99134183 0.99134183 0.99137867 0.99137867 0.98268366 0.99137931] mean value: 0.9913529444107553 key: test_accuracy value: [0.5 0.73076923 0.69230769 0.69230769 0.73076923 0.65384615 0.68 0.48 0.52 0.56 ] mean value: 0.624 key: train_accuracy value: [0.99565217 0.99565217 0.99565217 1. 0.99565217 0.99565217 0.995671 0.995671 0.99134199 0.995671 ] mean value: 0.9956615847920196 key: test_fscore value: [0.55172414 0.72 0.69230769 0.69230769 0.74074074 0.66666667 0.69230769 0.38095238 0.53846154 0.59259259] mean value: 0.626806113426803 key: train_fscore value: [0.995671 0.995671 0.995671 1. 0.995671 0.995671 0.99570815 0.99570815 0.99130435 0.995671 ] mean value: 0.9956746630864937 key: test_precision value: [0.5 0.75 0.69230769 0.69230769 0.71428571 0.64285714 0.64285714 0.44444444 0.53846154 0.57142857] mean value: 0.6188949938949939 key: train_precision value: [0.99137931 0.99137931 0.99137931 1. 0.99137931 0.99137931 0.99145299 0.99145299 0.99130435 0.99137931] mean value: 0.9922486192801035 key: test_recall value: [0.61538462 0.69230769 0.69230769 0.69230769 0.76923077 0.69230769 0.75 0.33333333 0.53846154 0.61538462] mean value: 0.639102564102564 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 0.99130435 1. ] mean value: 0.9991304347826087 key: test_roc_auc value: [0.5 0.73076923 0.69230769 0.69230769 0.73076923 0.65384615 0.68269231 0.47435897 0.51923077 0.55769231] mean value: 0.6233974358974359 key: train_roc_auc value: [0.99565217 0.99565217 0.99565217 1. 0.99565217 0.99565217 0.99565217 0.99565217 0.99134183 0.99568966] mean value: 0.9956596701649175 key: test_jcc value: [0.38095238 0.5625 0.52941176 0.52941176 0.58823529 0.5 0.52941176 0.23529412 0.36842105 0.42105263] mean value: 0.464469077104526 key: train_jcc value: [0.99137931 0.99137931 0.99137931 1. 0.99137931 0.99137931 0.99145299 0.99145299 0.98275862 0.99137931] mean value: 0.9913940465664604 MCC on Blind test: 0.29 Accuracy on Blind test: 0.65 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.44359875 0.41606069 0.41640043 0.4198699 0.41944647 0.4192605 0.42750263 0.41986775 0.41747665 0.42336249] mean value: 0.4222846269607544 key: score_time value: [0.00940156 0.00916147 0.00931311 0.0093689 0.01004791 0.00933456 0.00939679 0.0092485 0.00944662 0.00916076] mean value: 0.009388017654418945 key: test_mcc value: [0.63245553 0.69230769 0.5 0.69230769 0.77151675 0.69230769 0.67948718 0.28022427 0.52904327 0.76282051] mean value: 0.6232470588678629 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.80769231 0.84615385 0.73076923 0.84615385 0.88461538 0.84615385 0.84 0.64 0.76 0.88 ] mean value: 0.8081538461538461 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.82758621 0.84615385 0.66666667 0.84615385 0.88888889 0.84615385 0.83333333 0.57142857 0.75 0.88 ] mean value: 0.7956365205675551 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 0.84615385 0.875 0.84615385 0.85714286 0.84615385 0.83333333 0.66666667 0.81818182 0.91666667] mean value: 0.825545288045288 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.92307692 0.84615385 0.53846154 0.84615385 0.92307692 0.84615385 0.83333333 0.5 0.69230769 0.84615385] mean value: 0.7794871794871795 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.80769231 0.84615385 0.73076923 0.84615385 0.88461538 0.84615385 0.83974359 0.63461538 0.76282051 0.88141026] mean value: 0.8080128205128205 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.70588235 0.73333333 0.5 0.73333333 0.8 0.73333333 0.71428571 0.4 0.6 0.78571429] mean value: 0.6705882352941176 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.57 Accuracy on Blind test: 0.79 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.02251291 0.02159595 0.02179456 0.02176547 0.02233934 0.02202964 0.02235866 0.02211452 0.02222347 0.02275968] mean value: 0.022149419784545897 key: score_time value: [0.01358199 0.01356983 0.01375198 0.01353145 0.01821208 0.01508594 0.01521039 0.01506615 0.01855421 0.01217628] mean value: 0.014874029159545898 key: test_mcc value: [ 0.47434165 0.31622777 0.38461538 0.07784989 0.54494926 0.07784989 0.04516223 -0.12179487 0.03268602 -0.20645591] mean value: 0.16254313207270998 key: train_mcc value: [0.98275733 0.98275733 0.99134183 0.99134183 0.9742446 0.9658018 0.84693252 0.99137867 0.94113789 1. ] mean value: 0.9667693784024392 key: test_accuracy value: [0.73076923 0.65384615 0.69230769 0.53846154 0.76923077 0.53846154 0.52 0.44 0.52 0.4 ] mean value: 0.5803076923076923 key: train_accuracy value: [0.99130435 0.99130435 0.99565217 0.99565217 0.98695652 0.9826087 0.91774892 0.995671 0.96969697 1. ] mean value: 0.9826595143986449 key: test_fscore value: [0.75862069 0.68965517 0.69230769 0.57142857 0.75 0.57142857 0.53846154 0.41666667 0.57142857 0.44444444] mean value: 0.6004441918235022 key: train_fscore value: [0.99137931 0.99137931 0.995671 0.995671 0.98712446 0.98290598 0.92430279 0.99570815 0.97046414 1. ] mean value: 0.9834606136829099 key: test_precision value: [0.6875 0.625 0.69230769 0.53333333 0.81818182 0.53333333 0.5 0.41666667 0.53333333 0.42857143] mean value: 0.5768227605727606 key: train_precision value: [0.98290598 0.98290598 0.99137931 0.99137931 0.97457627 0.96638655 0.85925926 0.99145299 0.94262295 1. ] mean value: 0.9682868613841833 key: test_recall value: [0.84615385 0.76923077 0.69230769 0.61538462 0.69230769 0.61538462 0.58333333 0.41666667 0.61538462 0.46153846] mean value: 0.6307692307692307 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.73076923 0.65384615 0.69230769 0.53846154 0.76923077 0.53846154 0.5224359 0.43910256 0.51602564 0.3974359 ] mean value: 0.5798076923076924 key: train_roc_auc value: [0.99130435 0.99130435 0.99565217 0.99565217 0.98695652 0.9826087 0.9173913 0.99565217 0.96982759 1. ] mean value: 0.9826349325337331 key: test_jcc value: [0.61111111 0.52631579 0.52941176 0.4 0.6 0.4 0.36842105 0.26315789 0.4 0.28571429] mean value: 0.43841318983733846 key: train_jcc value: [0.98290598 0.98290598 0.99137931 0.99137931 0.97457627 0.96638655 0.85925926 0.99145299 0.94262295 1. ] mean value: 0.9682868613841833 MCC on Blind test: 0.06 Accuracy on Blind test: 0.54 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02251625 0.03479052 0.01392817 0.01403356 0.01398563 0.01390481 0.03432369 0.0349257 0.03514314 0.03442669] mean value: 0.025197815895080567 key: score_time value: [0.02350426 0.01194096 0.01172352 0.01173067 0.01177621 0.01163101 0.02290154 0.02014542 0.02137446 0.02326202] mean value: 0.016999006271362305 key: test_mcc value: [0.5 0.40422604 0.54494926 0.63245553 0.69230769 0.46291005 0.36774959 0.19611614 0.12179487 0.35897436] mean value: 0.4281483531816259 key: train_mcc value: [0.76524632 0.80003025 0.77403011 0.76663895 0.79130435 0.78263829 0.80089955 0.89822939 0.78356699 0.78358321] mean value: 0.7946167390990425 key: test_accuracy value: [0.73076923 0.69230769 0.76923077 0.80769231 0.84615385 0.73076923 0.68 0.6 0.56 0.68 ] mean value: 0.7096923076923077 key: train_accuracy value: [0.8826087 0.9 0.88695652 0.8826087 0.89565217 0.89130435 0.9004329 0.94805195 0.89177489 0.89177489] mean value: 0.897116506681724 key: test_fscore value: [0.77419355 0.63636364 0.75 0.82758621 0.84615385 0.74074074 0.69230769 0.54545455 0.56 0.69230769] mean value: 0.7065107908611802 key: train_fscore value: [0.88311688 0.9004329 0.88793103 0.87892377 0.89565217 0.89177489 0.9004329 0.95 0.89082969 0.89177489] /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:176: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:179: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) mean value: 0.8970869137067558 key: test_precision value: [0.66666667 0.77777778 0.81818182 0.75 0.84615385 0.71428571 0.64285714 0.6 0.58333333 0.69230769] mean value: 0.7091563991563992 key: train_precision value: [0.87931034 0.89655172 0.88034188 0.90740741 0.89565217 0.88793103 0.90434783 0.91935484 0.89473684 0.88793103] mean value: 0.8953565106495263 key: test_recall value: [0.92307692 0.53846154 0.69230769 0.92307692 0.84615385 0.76923077 0.75 0.5 0.53846154 0.69230769] mean value: 0.7173076923076923 key: train_recall value: [0.88695652 0.90434783 0.89565217 0.85217391 0.89565217 0.89565217 0.89655172 0.98275862 0.88695652 0.89565217] mean value: 0.8992353823088456 key: test_roc_auc value: [0.73076923 0.69230769 0.76923077 0.80769231 0.84615385 0.73076923 0.68269231 0.59615385 0.56089744 0.67948718] mean value: 0.7096153846153846 key: train_roc_auc value: [0.8826087 0.9 0.88695652 0.8826087 0.89565217 0.89130435 0.90044978 0.94790105 0.89175412 0.8917916 ] mean value: 0.8971026986506747 key: test_jcc value: [0.63157895 0.46666667 0.6 0.70588235 0.73333333 0.58823529 0.52941176 0.375 0.38888889 0.52941176] mean value: 0.5548409012727898 key: train_jcc value: [0.79069767 0.81889764 0.79844961 0.784 0.81102362 0.8046875 0.81889764 0.9047619 0.80314961 0.8046875 ] mean value: 0.8139252695520618 MCC on Blind test: 0.38 Accuracy on Blind test: 0.69 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.31885719 0.26284933 0.25009298 0.23583317 0.23779225 0.23270583 0.23397398 0.24211431 0.30185318 0.2662189 ] mean value: 0.25822911262512205 key: score_time value: [0.02399087 0.02392244 0.02324104 0.02295756 0.02134013 0.02078986 0.02291155 0.02077198 0.02287841 0.02306271] mean value: 0.022586655616760255 key: test_mcc value: [0.5 0.23354968 0.54494926 0.63245553 0.84615385 0.56591646 0.20645591 0.1990977 0.12179487 0.35897436] mean value: 0.4209347621861535 key: train_mcc value: [0.76524632 0.64350259 0.77403011 0.76663895 0.64369733 0.61776511 0.65447938 0.71429643 0.78356699 0.78358321] mean value: 0.7146806406714871 key: test_accuracy value: [0.73076923 0.61538462 0.76923077 0.80769231 0.92307692 0.76923077 0.6 0.6 0.56 0.68 ] mean value: 0.7055384615384616 key: train_accuracy value: [0.8826087 0.82173913 0.88695652 0.8826087 0.82173913 0.80869565 0.82683983 0.85714286 0.89177489 0.89177489] mean value: 0.8571880293619424 key: test_fscore value: [0.77419355 0.58333333 0.75 0.82758621 0.92307692 0.8 0.61538462 0.5 0.56 0.69230769] mean value: 0.7025882319386213 key: train_fscore value: [0.88311688 0.82251082 0.88793103 0.87892377 0.82403433 0.81196581 0.83193277 0.8583691 0.89082969 0.89177489] mean value: 0.8581389111576094 key: test_precision value: [0.66666667 0.63636364 0.81818182 0.75 0.92307692 0.70588235 0.57142857 0.625 0.58333333 0.69230769] mean value: 0.6972240994299818 key: train_precision value: [0.87931034 0.81896552 0.88034188 0.90740741 0.81355932 0.79831933 0.81147541 0.85470085 0.89473684 0.88793103] mean value: 0.8546747940708186 key: test_recall value: [0.92307692 0.53846154 0.69230769 0.92307692 0.92307692 0.92307692 0.66666667 0.41666667 0.53846154 0.69230769] mean value: 0.7237179487179487 key: train_recall value: [0.88695652 0.82608696 0.89565217 0.85217391 0.83478261 0.82608696 0.85344828 0.86206897 0.88695652 0.89565217] mean value: 0.8619865067466267 key: test_roc_auc value: [0.73076923 0.61538462 0.76923077 0.80769231 0.92307692 0.76923077 0.6025641 0.59294872 0.56089744 0.67948718] mean value: 0.7051282051282052 key: train_roc_auc value: [0.8826087 0.82173913 0.88695652 0.8826087 0.82173913 0.80869565 0.82672414 0.85712144 0.89175412 0.8917916 ] mean value: 0.8571739130434783 key: test_jcc value: [0.63157895 0.41176471 0.6 0.70588235 0.85714286 0.66666667 0.44444444 0.33333333 0.38888889 0.52941176] mean value: 0.5569113961374024 key: train_jcc value: [0.79069767 0.69852941 0.79844961 0.784 0.70072993 0.68345324 0.71223022 0.7518797 0.80314961 0.8046875 ] mean value: 0.7527806884378454 MCC on Blind test: 0.38 Accuracy on Blind test: 0.69 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03337646 0.03352547 0.03370428 0.03442836 0.03449273 0.03441572 0.03380013 0.04961681 0.03415537 0.0339179 ] mean value: 0.03554332256317139 key: score_time value: [0.01657343 0.01375055 0.01404023 0.01396298 0.01412797 0.01399684 0.01417994 0.01409483 0.01432037 0.01417518] mean value: 0.014322233200073243 key: test_mcc value: [0.62994079 0.57265629 0.49612132 0.61925228 0.48954403 0.53006813 0.29960206 0.67916667 0.55 0.58316015] mean value: 0.5449511717775465 key: train_mcc value: [0.66430266 0.7071609 0.72953394 0.69442482 0.72959417 0.72260223 0.70820669 0.70820669 0.71529889 0.72256008] mean value: 0.710189106315 key: test_accuracy value: [0.8125 0.78125 0.74193548 0.80645161 0.74193548 0.74193548 0.64516129 0.83870968 0.77419355 0.77419355] mean value: 0.7658266129032258 key: train_accuracy value: [0.83214286 0.85357143 0.86476868 0.84697509 0.86476868 0.86120996 0.85409253 0.85409253 0.85765125 0.86120996] mean value: 0.8550482968988307 key: test_fscore value: [0.8 0.8 0.69230769 0.8125 0.75 0.77777778 0.62068966 0.83870968 0.77419355 0.74074074] mean value: 0.7606919091805077 key: train_fscore value: [0.83274021 0.85304659 0.86524823 0.84476534 0.86619718 0.86021505 0.85409253 0.85409253 0.85714286 0.85920578] mean value: 0.8546746301974811 key: test_precision value: [0.85714286 0.73684211 0.81818182 0.76470588 0.70588235 0.66666667 0.69230769 0.86666667 0.8 0.90909091] mean value: 0.7817486950613886 key: train_precision value: [0.82978723 0.85611511 0.86524823 0.86029412 0.86013986 0.86956522 0.85106383 0.85106383 0.85714286 0.86861314] mean value: 0.8569033419488257 key: test_recall value: [0.75 0.875 0.6 0.86666667 0.8 0.93333333 0.5625 0.8125 0.75 0.625 ] mean value: 0.7575000000000001 key: train_recall value: [0.83571429 0.85 0.86524823 0.82978723 0.87234043 0.85106383 0.85714286 0.85714286 0.85714286 0.85 ] mean value: 0.8525582573454914 key: test_roc_auc value: [0.8125 0.78125 0.7375 0.80833333 0.74375 0.74791667 0.64791667 0.83958333 0.775 0.77916667] mean value: 0.7672916666666667 key: train_roc_auc value: [0.83214286 0.85357143 0.86476697 0.84703647 0.86474164 0.8612462 0.85410334 0.85410334 0.85764944 0.86117021] mean value: 0.8550531914893618 key: test_jcc value: [0.66666667 0.66666667 0.52941176 0.68421053 0.6 0.63636364 0.45 0.72222222 0.63157895 0.58823529] mean value: 0.6175355724426932 key: train_jcc value: [0.71341463 0.74375 0.7625 0.73125 0.76397516 0.75471698 0.74534161 0.74534161 0.75 0.75316456] mean value: 0.746345455733361 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.7325387 0.89071441 0.77209663 0.88231349 0.76088715 0.76609707 0.87763047 0.75076795 0.75155282 0.94855165] mean value: 0.8133150339126587 key: score_time value: [0.01201606 0.01187754 0.01192522 0.01188445 0.01186037 0.01193094 0.01192117 0.01198769 0.01191378 0.01193261] mean value: 0.011924982070922852 key: test_mcc value: [0.5 0.57265629 0.42321607 0.55 0.50443936 0.4770843 0.29960206 0.49612132 0.61925228 0.58316015] mean value: 0.502553182374343 key: train_mcc value: [0.62882815 0.62914948 0.70131788 0.63736394 0.5960184 0.65835866 0.63717891 0.60897119 0.60936188 0.71538579] mean value: 0.6421934266319843 key: test_accuracy value: [0.75 0.78125 0.70967742 0.77419355 0.74193548 0.70967742 0.64516129 0.74193548 0.80645161 0.77419355] mean value: 0.7434475806451613 key: train_accuracy value: [0.81428571 0.81428571 0.85053381 0.81850534 0.79715302 0.82918149 0.81850534 0.80427046 0.80427046 0.85765125] mean value: 0.8208642602948653 key: test_fscore value: [0.75 0.8 0.66666667 0.77419355 0.76470588 0.75675676 0.62068966 0.77777778 0.8 0.74074074] mean value: 0.7451531027854393 key: train_fscore value: [0.81690141 0.81818182 0.85314685 0.82229965 0.80546075 0.82978723 0.81978799 0.80701754 0.80836237 0.85815603] mean value: 0.8239101643675262 key: test_precision value: [0.75 0.73684211 0.75 0.75 0.68421053 0.63636364 0.69230769 0.7 0.85714286 0.90909091] mean value: 0.7465957726484043 key: train_precision value: [0.80555556 0.80136986 0.84137931 0.80821918 0.77631579 0.82978723 0.81118881 0.79310345 0.78911565 0.85211268] mean value: 0.8108147512292025 key: test_recall value: [0.75 0.875 0.6 0.8 0.86666667 0.93333333 0.5625 0.875 0.75 0.625 ] mean value: 0.76375 key: train_recall value: [0.82857143 0.83571429 0.86524823 0.83687943 0.83687943 0.82978723 0.82857143 0.82142857 0.82857143 0.86428571] mean value: 0.8375937183383992 key: test_roc_auc value: [0.75 0.78125 0.70625 0.775 0.74583333 0.71666667 0.64791667 0.7375 0.80833333 0.77916667] mean value: 0.7447916666666667 key: train_roc_auc value: [0.81428571 0.81428571 0.85048126 0.81843972 0.79701114 0.82917933 0.81854103 0.80433131 0.80435664 0.85767477] mean value: 0.8208586626139818 key: test_jcc value: [0.6 0.66666667 0.5 0.63157895 0.61904762 0.60869565 0.45 0.63636364 0.66666667 0.58823529] mean value: 0.596725448240457 key: train_jcc value: [0.69047619 0.69230769 0.74390244 0.69822485 0.67428571 0.70909091 0.69461078 0.67647059 0.67836257 0.7515528 ] mean value: 0.7009284532064781 MCC on Blind test: 0.32 Accuracy on Blind test: 0.66 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01314902 0.01112795 0.00962234 0.00998735 0.00902677 0.00961423 0.00921035 0.00920057 0.00901651 0.00913954] mean value: 0.009909462928771973 key: score_time value: [0.01189709 0.00921988 0.00953174 0.0086832 0.00859284 0.00856686 0.00860071 0.00853753 0.00849509 0.00902152] mean value: 0.009114646911621093 key: test_mcc value: [0.25819889 0.53935989 0.23939495 0.74896053 0.05046084 0.23939495 0.28870546 0.29069387 0.48333333 0.74896053] mean value: 0.38874632340433807 key: train_mcc value: [0.49726525 0.47977484 0.50204455 0.50192607 0.51324925 0.51683781 0.47539896 0.51888435 0.45793458 0.49211483] mean value: 0.49554304714846314 key: test_accuracy value: [0.625 0.75 0.61290323 0.87096774 0.51612903 0.61290323 0.64516129 0.64516129 0.74193548 0.87096774] mean value: 0.6891129032258064 key: train_accuracy value: [0.74285714 0.73571429 0.74377224 0.75088968 0.75088968 0.7544484 0.73309609 0.7544484 0.70818505 0.7366548 ] mean value: 0.7410955770208439 key: test_fscore value: [0.66666667 0.78947368 0.64705882 0.875 0.59459459 0.64705882 0.66666667 0.68571429 0.75 0.86666667] mean value: 0.718890021157823 key: train_fscore value: [0.76774194 0.75816993 0.7721519 0.75524476 0.77564103 0.7752443 0.75570033 0.7752443 0.75739645 0.7672956 ] mean value: 0.7659830522014204 key: test_precision value: [0.6 0.68181818 0.57894737 0.82352941 0.5 0.57894737 0.64705882 0.63157895 0.75 0.92857143] mean value: 0.6720451529894255 key: train_precision value: [0.7 0.69879518 0.69714286 0.74482759 0.70760234 0.71686747 0.69461078 0.71257485 0.64646465 0.68539326] mean value: 0.7004278966767578 key: test_recall value: [0.75 0.9375 0.73333333 0.93333333 0.73333333 0.73333333 0.6875 0.75 0.75 0.8125 ] mean value: 0.7820833333333334 key: train_recall value: [0.85 0.82857143 0.86524823 0.76595745 0.85815603 0.84397163 0.82857143 0.85 0.91428571 0.87142857] mean value: 0.8476190476190476 key: test_roc_auc value: [0.625 0.75 0.61666667 0.87291667 0.52291667 0.61666667 0.64375 0.64166667 0.74166667 0.87291667] mean value: 0.6904166666666667 key: train_roc_auc value: [0.74285714 0.73571429 0.7433384 0.75083587 0.75050659 0.75412867 0.73343465 0.75478723 0.70891591 0.73713273] mean value: 0.7411651469098278 key: test_jcc value: [0.5 0.65217391 0.47826087 0.77777778 0.42307692 0.47826087 0.5 0.52173913 0.6 0.76470588] mean value: 0.5695995365816338 key: train_jcc value: [0.62303665 0.61052632 0.62886598 0.60674157 0.63350785 0.63297872 0.60732984 0.63297872 0.60952381 0.62244898] mean value: 0.6207938449678521 MCC on Blind test: 0.28 Accuracy on Blind test: 0.65 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00933599 0.00952888 0.00928903 0.00916195 0.00920892 0.00923371 0.00950384 0.00935936 0.0097599 0.00947571] mean value: 0.00938572883605957 key: score_time value: [0.00861382 0.00869727 0.00851321 0.00864816 0.00889111 0.00874734 0.00866508 0.00860071 0.00856805 0.00867391] mean value: 0.008661866188049316 key: test_mcc value: [0.56360186 0.48653363 0.15899721 0.5612264 0.4770843 0.37191715 0.28870546 0.42321607 0.09583333 0.58316015] mean value: 0.40102755595919914 key: train_mcc value: [0.57430732 0.53803891 0.55459753 0.49900055 0.55953199 0.56745262 0.55493536 0.58009119 0.54153517 0.53306083] mean value: 0.5502551476643553 key: test_accuracy value: [0.78125 0.71875 0.58064516 0.77419355 0.70967742 0.67741935 0.64516129 0.70967742 0.5483871 0.77419355] mean value: 0.6919354838709677 key: train_accuracy value: [0.78571429 0.76785714 0.77580071 0.74733096 0.77935943 0.78291815 0.77580071 0.79003559 0.76868327 0.76512456] mean value: 0.7738624809354346 key: test_fscore value: [0.77419355 0.76923077 0.55172414 0.78787879 0.75675676 0.70588235 0.66666667 0.74285714 0.5625 0.74074074] mean value: 0.7058430903390172 key: train_fscore value: [0.79591837 0.778157 0.78787879 0.7641196 0.7862069 0.79180887 0.78644068 0.79003559 0.78114478 0.7755102 ] mean value: 0.7837220773794649 key: test_precision value: [0.8 0.65217391 0.57142857 0.72222222 0.63636364 0.63157895 0.64705882 0.68421053 0.5625 0.90909091] mean value: 0.681662754936244 key: train_precision value: [0.75974026 0.74509804 0.75 0.71875 0.76510067 0.76315789 0.7483871 0.78723404 0.7388535 0.74025974] mean value: 0.7516581247605566 key: test_recall value: [0.75 0.9375 0.53333333 0.86666667 0.93333333 0.8 0.6875 0.8125 0.5625 0.625 ] mean value: 0.7508333333333334 key: train_recall value: [0.83571429 0.81428571 0.82978723 0.81560284 0.80851064 0.82269504 0.82857143 0.79285714 0.82857143 0.81428571] mean value: 0.8190881458966566 key: test_roc_auc value: [0.78125 0.71875 0.57916667 0.77708333 0.71666667 0.68125 0.64375 0.70625 0.54791667 0.77916667] mean value: 0.693125 key: train_roc_auc value: [0.78571429 0.76785714 0.7756079 0.74708713 0.77925532 0.78277609 0.77598784 0.79004559 0.76889564 0.76529889] mean value: 0.7738525835866261 key: test_jcc value: [0.63157895 0.625 0.38095238 0.65 0.60869565 0.54545455 0.5 0.59090909 0.39130435 0.58823529] mean value: 0.5512130258802086 key: train_jcc value: [0.66101695 0.63687151 0.65 0.61827957 0.64772727 0.65536723 0.64804469 0.65294118 0.64088398 0.63333333] mean value: 0.6444465712232499 MCC on Blind test: 0.35 Accuracy on Blind test: 0.68 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00932002 0.01014709 0.00882673 0.00991416 0.00968051 0.00960183 0.00987482 0.00993252 0.00893021 0.00874734] mean value: 0.009497523307800293 key: score_time value: [0.01476312 0.01438737 0.01490664 0.0168004 0.01892352 0.01665711 0.01663828 0.011374 0.01400232 0.01244378] mean value: 0.015089654922485351 key: test_mcc value: [ 0.37796447 0.12598816 0.02928896 0.28870546 -0.02227177 0.29960206 0.225 0.15899721 0.44824996 0.19266866] mean value: 0.21241931692556495 key: train_mcc value: [0.50005103 0.56441531 0.58765691 0.57302802 0.60162197 0.5956161 0.61886765 0.6014592 0.53738602 0.53738602] mean value: 0.5717488216987027 key: test_accuracy value: [0.6875 0.5625 0.51612903 0.64516129 0.48387097 0.64516129 0.61290323 0.58064516 0.70967742 0.58064516] mean value: 0.6024193548387097 key: train_accuracy value: [0.75 0.78214286 0.79359431 0.78647687 0.80071174 0.79715302 0.80782918 0.80071174 0.76868327 0.76868327] mean value: 0.7855986273512964 key: test_fscore value: [0.70588235 0.58823529 0.48275862 0.62068966 0.55555556 0.66666667 0.625 0.60606061 0.66666667 0.48 ] mean value: 0.5997515417870387 key: train_fscore value: [0.75177305 0.7844523 0.79861111 0.78571429 0.8041958 0.79120879 0.81632653 0.79856115 0.76868327 0.76868327] mean value: 0.7868209568429256 key: test_precision value: [0.66666667 0.55555556 0.5 0.64285714 0.47619048 0.61111111 0.625 0.58823529 0.81818182 0.66666667] mean value: 0.6150464731347084 key: train_precision value: [0.74647887 0.77622378 0.78231293 0.79136691 0.79310345 0.81818182 0.77922078 0.80434783 0.76595745 0.76595745] mean value: 0.7823151246490538 key: test_recall value: [0.75 0.625 0.46666667 0.6 0.66666667 0.73333333 0.625 0.625 0.5625 0.375 ] mean value: 0.6029166666666667 key: train_recall value: [0.75714286 0.79285714 0.81560284 0.78014184 0.81560284 0.76595745 0.85714286 0.79285714 0.77142857 0.77142857] mean value: 0.7920162107396149 key: test_roc_auc value: [0.6875 0.5625 0.51458333 0.64375 0.48958333 0.64791667 0.6125 0.57916667 0.71458333 0.5875 ] mean value: 0.6039583333333334 key: train_roc_auc value: [0.75 0.78214286 0.7935157 0.78649949 0.80065856 0.79726444 0.80800405 0.80068389 0.76869301 0.76869301] mean value: 0.7856155015197568 key: test_jcc value: [0.54545455 0.41666667 0.31818182 0.45 0.38461538 0.5 0.45454545 0.43478261 0.5 0.31578947] mean value: 0.4320035951843732 key: train_jcc value: [0.60227273 0.64534884 0.66473988 0.64705882 0.67251462 0.65454545 0.68965517 0.66467066 0.62427746 0.62427746] mean value: 0.6489361091224226 MCC on Blind test: 0.13 Accuracy on Blind test: 0.56 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01523018 0.01354218 0.01364875 0.0138216 0.0136342 0.01381826 0.01368165 0.01369882 0.01525903 0.01383352] mean value: 0.014016819000244141 key: score_time value: [0.00996876 0.00985384 0.00985217 0.00979042 0.00975299 0.00979042 0.0098505 0.0107069 0.00991988 0.01005197] mean value: 0.009953784942626952 key: test_mcc value: [0.625 0.53935989 0.42083333 0.63696156 0.50443936 0.44824996 0.35983579 0.42321607 0.63696156 0.63696156] mean value: 0.5231819080590403 key: train_mcc value: [0.71428571 0.73618394 0.71640396 0.71599514 0.70901046 0.73761389 0.68713898 0.73015914 0.70819191 0.72980243] mean value: 0.7184785546282559 key: test_accuracy value: [0.8125 0.75 0.70967742 0.80645161 0.74193548 0.70967742 0.67741935 0.70967742 0.80645161 0.80645161] mean value: 0.7530241935483871 key: train_accuracy value: [0.85714286 0.86785714 0.85765125 0.85765125 0.85409253 0.8683274 0.84341637 0.86476868 0.85409253 0.86476868] mean value: 0.8589768683274022 key: test_fscore value: [0.8125 0.78947368 0.70967742 0.82352941 0.76470588 0.74285714 0.66666667 0.74285714 0.78571429 0.78571429] mean value: 0.7623695921492536 key: train_fscore value: [0.85714286 0.86545455 0.86206897 0.85507246 0.85813149 0.86545455 0.84507042 0.86131387 0.85304659 0.86231884] mean value: 0.8585074591936718 key: test_precision value: [0.8125 0.68181818 0.6875 0.73684211 0.68421053 0.65 0.71428571 0.68421053 0.91666667 0.91666667] mean value: 0.7484700387331966 key: train_precision value: [0.85714286 0.88148148 0.83892617 0.87407407 0.83783784 0.8880597 0.83333333 0.88059701 0.85611511 0.875 ] mean value: 0.8622567582697808 key: test_recall value: [0.8125 0.9375 0.73333333 0.93333333 0.86666667 0.86666667 0.625 0.8125 0.6875 0.6875 ] mean value: 0.79625 key: train_recall value: [0.85714286 0.85 0.88652482 0.83687943 0.87943262 0.84397163 0.85714286 0.84285714 0.85 0.85 ] mean value: 0.8553951367781155 key: test_roc_auc value: [0.8125 0.75 0.71041667 0.81041667 0.74583333 0.71458333 0.67916667 0.70625 0.81041667 0.81041667] mean value: 0.755 key: train_roc_auc value: [0.85714286 0.86785714 0.85754813 0.85772543 0.85400203 0.86841439 0.84346505 0.86469098 0.85407801 0.86471631] mean value: 0.8589640324214792 key: test_jcc value: [0.68421053 0.65217391 0.55 0.7 0.61904762 0.59090909 0.5 0.59090909 0.64705882 0.64705882] mean value: 0.6181367887283893 key: train_jcc value: [0.75 0.76282051 0.75757576 0.74683544 0.75151515 0.76282051 0.73170732 0.75641026 0.74375 0.75796178] mean value: 0.7521396734692827 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.16019416 1.35749841 1.18764305 1.23864603 1.30716753 1.15232587 1.29647064 1.16228747 1.28212929 1.14609051] mean value: 1.229045295715332 key: score_time value: [0.01337218 0.01464963 0.02091813 0.01439071 0.01440692 0.01583743 0.0146606 0.01481366 0.01492953 0.01485419] mean value: 0.015283298492431641 key: test_mcc value: [0.69991324 0.50395263 0.61608311 0.48954403 0.35983579 0.57104024 0.58316015 0.48527095 0.67916667 0.69203857] mean value: 0.5680005373888458 key: train_mcc value: [0.99288247 0.97859639 0.99290744 0.98586555 1. 0.98576494 0.9929078 0.98576494 0.9929078 0.98576494] mean value: 0.9893362290587903 key: test_accuracy value: [0.84375 0.75 0.80645161 0.74193548 0.67741935 0.74193548 0.77419355 0.74193548 0.83870968 0.83870968] mean value: 0.7755040322580645 key: train_accuracy value: [0.99642857 0.98928571 0.99644128 0.99288256 1. 0.99288256 0.99644128 0.99288256 0.99644128 0.99288256] mean value: 0.9946568378240976 key: test_fscore value: [0.82758621 0.76470588 0.78571429 0.75 0.6875 0.78947368 0.74074074 0.76470588 0.83870968 0.82758621] mean value: 0.7776722566583893 key: train_fscore value: [0.99644128 0.98924731 0.99646643 0.99285714 1. 0.9929078 0.99644128 0.99285714 0.99644128 0.99285714] mean value: 0.9946516816329601 key: test_precision value: [0.92307692 0.72222222 0.84615385 0.70588235 0.64705882 0.65217391 0.90909091 0.72222222 0.86666667 0.92307692] mean value: 0.7917624802023779 key: train_precision value: [0.9929078 0.99280576 0.99295775 1. 1. 0.9929078 0.9929078 0.99285714 0.9929078 0.99285714] mean value: 0.9943108993262602 key: test_recall value: [0.75 0.8125 0.73333333 0.8 0.73333333 1. 0.625 0.8125 0.8125 0.75 ] mean value: 0.7829166666666667 key: train_recall value: [1. 0.98571429 1. 0.9858156 1. 0.9929078 1. 0.99285714 1. 0.99285714] mean value: 0.9950151975683891 key: test_roc_auc value: [0.84375 0.75 0.80416667 0.74375 0.67916667 0.75 0.77916667 0.73958333 0.83958333 0.84166667] mean value: 0.7770833333333333 key: train_roc_auc value: [0.99642857 0.98928571 0.99642857 0.9929078 1. 0.99288247 0.9964539 0.99288247 0.9964539 0.99288247] mean value: 0.9946605876393111 key: test_jcc value: [0.70588235 0.61904762 0.64705882 0.6 0.52380952 0.65217391 0.58823529 0.61904762 0.72222222 0.70588235] mean value: 0.6383359720699875 key: train_jcc value: [0.9929078 0.9787234 0.99295775 0.9858156 1. 0.98591549 0.9929078 0.9858156 0.9929078 0.9858156 ] mean value: 0.9893766856457896 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02894592 0.0212357 0.02063727 0.02075434 0.02135015 0.01837444 0.02223134 0.02115107 0.01944661 0.02165294] mean value: 0.021577978134155275 key: score_time value: [0.0115757 0.00920677 0.00933361 0.00870323 0.00865054 0.00864983 0.00901675 0.00935531 0.00854015 0.00858498] mean value: 0.009161686897277832 key: test_mcc value: [0.62994079 0.68884672 0.67916667 0.67916667 0.48527095 0.63696156 0.53006813 0.22364661 0.5612264 0.55 ] mean value: 0.5664294488138983 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8125 0.84375 0.83870968 0.83870968 0.74193548 0.80645161 0.74193548 0.61290323 0.77419355 0.77419355] mean value: 0.7785282258064516 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.82352941 0.84848485 0.83870968 0.83870968 0.71428571 0.82352941 0.69230769 0.64705882 0.75862069 0.77419355] mean value: 0.7759429495018058 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.77777778 0.82352941 0.8125 0.8125 0.76923077 0.73684211 0.9 0.61111111 0.84615385 0.8 ] mean value: 0.7889645021301368 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.875 0.875 0.86666667 0.86666667 0.66666667 0.93333333 0.5625 0.6875 0.6875 0.75 ] mean value: 0.7770833333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8125 0.84375 0.83958333 0.83958333 0.73958333 0.81041667 0.74791667 0.61041667 0.77708333 0.775 ] mean value: 0.7795833333333333 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.7 0.73684211 0.72222222 0.72222222 0.55555556 0.7 0.52941176 0.47826087 0.61111111 0.63157895] mean value: 0.638720479801379 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.10602975 0.10733938 0.10524845 0.10616469 0.1080482 0.10535192 0.10625601 0.10542488 0.10674691 0.10701108] mean value: 0.10636212825775146 key: score_time value: [0.01797223 0.01859069 0.01800942 0.01755714 0.01740718 0.01746559 0.01737189 0.01737976 0.01745605 0.01783395] mean value: 0.01770439147949219 key: test_mcc value: [0.56360186 0.72374686 0.54812195 0.69203857 0.44824996 0.4770843 0.5612264 0.49612132 0.61925228 0.57104024] mean value: 0.5700483750913896 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.78125 0.84375 0.77419355 0.83870968 0.70967742 0.70967742 0.77419355 0.74193548 0.80645161 0.74193548] mean value: 0.7721774193548387 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.78787879 0.86486486 0.75862069 0.84848485 0.74285714 0.75675676 0.75862069 0.77777778 0.8 0.66666667] mean value: 0.7762528224597189 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.76470588 0.76190476 0.78571429 0.77777778 0.65 0.63636364 0.84615385 0.7 0.85714286 1. ] mean value: 0.7779763047410106 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.8125 1. 0.73333333 0.93333333 0.86666667 0.93333333 0.6875 0.875 0.75 0.5 ] mean value: 0.8091666666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.78125 0.84375 0.77291667 0.84166667 0.71458333 0.71666667 0.77708333 0.7375 0.80833333 0.75 ] mean value: 0.774375 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.65 0.76190476 0.61111111 0.73684211 0.59090909 0.60869565 0.61111111 0.63636364 0.66666667 0.5 ] mean value: 0.6373604135503449 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.3 Accuracy on Blind test: 0.66 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01085496 0.00925255 0.01008058 0.00925827 0.00935721 0.00931215 0.0092001 0.0097034 0.00947332 0.01037645] mean value: 0.009686899185180665 key: score_time value: [0.00925589 0.00849748 0.0090673 0.00847626 0.00856829 0.00894523 0.00853539 0.00952673 0.00882292 0.00931621] mean value: 0.008901166915893554 key: test_mcc value: [0.18786729 0.46056619 0.29069387 0.225 0.42083333 0.23012754 0.25389818 0.55573827 0.29166667 0.58316015] mean value: 0.349955147894385 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.59375 0.71875 0.64516129 0.61290323 0.70967742 0.61290323 0.61290323 0.77419355 0.64516129 0.77419355] mean value: 0.6699596774193548 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.58064516 0.66666667 0.59259259 0.6 0.70967742 0.625 0.53846154 0.8 0.64516129 0.74074074] mean value: 0.649894540942928 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.6 0.81818182 0.66666667 0.6 0.6875 0.58823529 0.7 0.73684211 0.66666667 0.90909091] mean value: 0.6973183459986866 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.5625 0.5625 0.53333333 0.6 0.73333333 0.66666667 0.4375 0.875 0.625 0.625 ] mean value: 0.6220833333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.59375 0.71875 0.64166667 0.6125 0.71041667 0.61458333 0.61875 0.77083333 0.64583333 0.77916667] mean value: 0.670625 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.40909091 0.5 0.42105263 0.42857143 0.55 0.45454545 0.36842105 0.66666667 0.47619048 0.58823529] mean value: 0.48627739133931086 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.09 Accuracy on Blind test: 0.55 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.45871091 1.44358611 1.44652629 1.49799991 1.45427561 1.44756174 1.44085765 1.46252918 1.51522851 1.45094728] mean value: 1.4618223190307618 key: score_time value: [0.09098411 0.09088707 0.09599185 0.09621763 0.09140348 0.0917809 0.0976913 0.09258533 0.09712076 0.0905304 ] mean value: 0.09351928234100342 key: test_mcc value: [0.625 0.75 0.74166667 0.6125 0.29960206 0.53006813 0.50443936 0.48527095 0.69203857 0.63696156] mean value: 0.5877547290151389 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8125 0.875 0.87096774 0.80645161 0.64516129 0.74193548 0.74193548 0.74193548 0.83870968 0.80645161] mean value: 0.7881048387096774 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8125 0.875 0.86666667 0.8 0.66666667 0.77777778 0.71428571 0.76470588 0.82758621 0.78571429] mean value: 0.7890903200360604 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8125 0.875 0.86666667 0.8 0.61111111 0.66666667 0.83333333 0.72222222 0.92307692 0.91666667] mean value: 0.802724358974359 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.8125 0.875 0.86666667 0.8 0.73333333 0.93333333 0.625 0.8125 0.75 0.6875 ] mean value: 0.7895833333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8125 0.875 0.87083333 0.80625 0.64791667 0.74791667 0.74583333 0.73958333 0.84166667 0.81041667] mean value: 0.7897916666666667 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.68421053 0.77777778 0.76470588 0.66666667 0.5 0.63636364 0.55555556 0.61904762 0.70588235 0.64705882] mean value: 0.6557268840550574 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.43 Accuracy on Blind test: 0.72 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.87758327 0.92826319 0.93050742 0.84618402 0.95714593 0.92273808 0.95817399 0.91892672 0.88999343 0.88455796] mean value: 0.9114073991775513 key: score_time value: [0.21835947 0.24447083 0.26226187 0.22112417 0.26000118 0.18407679 0.26429701 0.2288034 0.23047781 0.22172642] mean value: 0.23355989456176757 key: test_mcc value: [0.68884672 0.875 0.80753845 0.61925228 0.50443936 0.42352151 0.61925228 0.4184137 0.69203857 0.58316015] mean value: 0.6231463022247349 key: train_mcc value: [0.90009185 0.91428571 0.87919331 0.90767208 0.8934327 0.91458967 0.89344886 0.89326241 0.91458967 0.9219233 ] mean value: 0.9032489559187864 key: test_accuracy value: [0.84375 0.9375 0.90322581 0.80645161 0.74193548 0.67741935 0.80645161 0.70967742 0.83870968 0.77419355] mean value: 0.8039314516129032 key: train_accuracy value: [0.95 0.95714286 0.93950178 0.95373665 0.94661922 0.95729537 0.94661922 0.94661922 0.95729537 0.96085409] mean value: 0.9515683782409761 key: test_fscore value: [0.83870968 0.9375 0.89655172 0.8125 0.76470588 0.73684211 0.8 0.72727273 0.82758621 0.74074074] mean value: 0.8082409064083405 key: train_fscore value: [0.95035461 0.95714286 0.94035088 0.95438596 0.94736842 0.95744681 0.94699647 0.94661922 0.95714286 0.96113074] mean value: 0.9518938821445742 key: test_precision value: [0.86666667 0.9375 0.92857143 0.76470588 0.68421053 0.60869565 0.85714286 0.70588235 0.92307692 0.90909091] mean value: 0.8185543198332604 /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: train_precision value: [0.94366197 0.95714286 0.93055556 0.94444444 0.9375 0.95744681 0.93706294 0.94326241 0.95714286 0.95104895] mean value: 0.9459268794086745 key: test_recall value: [0.8125 0.9375 0.86666667 0.86666667 0.86666667 0.93333333 0.75 0.75 0.75 0.625 ] mean value: 0.8158333333333333 key: train_recall value: [0.95714286 0.95714286 0.95035461 0.96453901 0.95744681 0.95744681 0.95714286 0.95 0.95714286 0.97142857] mean value: 0.9579787234042554 key: test_roc_auc value: [0.84375 0.9375 0.90208333 0.80833333 0.74583333 0.68541667 0.80833333 0.70833333 0.84166667 0.77916667] mean value: 0.8060416666666667 key: train_roc_auc value: [0.95 0.95714286 0.93946302 0.95369807 0.94658055 0.95729483 0.94665653 0.94663121 0.95729483 0.96089159] mean value: 0.951565349544073 key: test_jcc value: [0.72222222 0.88235294 0.8125 0.68421053 0.61904762 0.58333333 0.66666667 0.57142857 0.70588235 0.58823529] mean value: 0.6835879527249497 key: train_jcc value: [0.90540541 0.91780822 0.88741722 0.91275168 0.9 0.91836735 0.89932886 0.89864865 0.91780822 0.92517007] mean value: 0.9082705662832002 MCC on Blind test: 0.48 Accuracy on Blind test: 0.74 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02361965 0.00960827 0.01067877 0.00956154 0.01037812 0.01068234 0.0104816 0.00967288 0.00952721 0.0107491 ] mean value: 0.01149594783782959 key: score_time value: [0.01264954 0.00892997 0.0097971 0.00886059 0.00994897 0.0096302 0.00885224 0.00893092 0.00966692 0.01322079] mean value: 0.010048723220825196 key: test_mcc value: [0.56360186 0.48653363 0.15899721 0.5612264 0.4770843 0.37191715 0.28870546 0.42321607 0.09583333 0.58316015] mean value: 0.40102755595919914 key: train_mcc value: [0.57430732 0.53803891 0.55459753 0.49900055 0.55953199 0.56745262 0.55493536 0.58009119 0.54153517 0.53306083] mean value: 0.5502551476643553 key: test_accuracy value: [0.78125 0.71875 0.58064516 0.77419355 0.70967742 0.67741935 0.64516129 0.70967742 0.5483871 0.77419355] mean value: 0.6919354838709677 key: train_accuracy value: [0.78571429 0.76785714 0.77580071 0.74733096 0.77935943 0.78291815 0.77580071 0.79003559 0.76868327 0.76512456] mean value: 0.7738624809354346 key: test_fscore value: [0.77419355 0.76923077 0.55172414 0.78787879 0.75675676 0.70588235 0.66666667 0.74285714 0.5625 0.74074074] mean value: 0.7058430903390172 key: train_fscore value: [0.79591837 0.778157 0.78787879 0.7641196 0.7862069 0.79180887 0.78644068 0.79003559 0.78114478 0.7755102 ] mean value: 0.7837220773794649 key: test_precision value: [0.8 0.65217391 0.57142857 0.72222222 0.63636364 0.63157895 0.64705882 0.68421053 0.5625 0.90909091] mean value: 0.681662754936244 key: train_precision value: [0.75974026 0.74509804 0.75 0.71875 0.76510067 0.76315789 0.7483871 0.78723404 0.7388535 0.74025974] mean value: 0.7516581247605566 key: test_recall value: [0.75 0.9375 0.53333333 0.86666667 0.93333333 0.8 0.6875 0.8125 0.5625 0.625 ] mean value: 0.7508333333333334 key: train_recall value: [0.83571429 0.81428571 0.82978723 0.81560284 0.80851064 0.82269504 0.82857143 0.79285714 0.82857143 0.81428571] mean value: 0.8190881458966566 key: test_roc_auc value: [0.78125 0.71875 0.57916667 0.77708333 0.71666667 0.68125 0.64375 0.70625 0.54791667 0.77916667] mean value: 0.693125 key: train_roc_auc value: [0.78571429 0.76785714 0.7756079 0.74708713 0.77925532 0.78277609 0.77598784 0.79004559 0.76889564 0.76529889] mean value: 0.7738525835866261 key: test_jcc value: [0.63157895 0.625 0.38095238 0.65 0.60869565 0.54545455 0.5 0.59090909 0.39130435 0.58823529] mean value: 0.5512130258802086 key: train_jcc value: [0.66101695 0.63687151 0.65 0.61827957 0.64772727 0.65536723 0.64804469 0.65294118 0.64088398 0.63333333] mean value: 0.6444465712232499 MCC on Blind test: 0.35 Accuracy on Blind test: 0.68 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.24555159 0.0699985 0.06857967 0.06968999 0.07746792 0.06782889 0.06897974 0.07231164 0.06659174 0.06663275] mean value: 0.08736324310302734 key: score_time value: [0.01073599 0.01065278 0.01036191 0.01028013 0.01068068 0.0102334 0.01025152 0.01063037 0.01021504 0.01019859] mean value: 0.010424041748046875 key: test_mcc value: [0.75 0.81409158 0.87083333 0.87083333 0.4184137 0.6681531 0.52291252 0.55 0.67916667 0.67916667] mean value: 0.6823570904153414 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.875 0.90625 0.93548387 0.93548387 0.70967742 0.80645161 0.70967742 0.77419355 0.83870968 0.83870968] mean value: 0.8329637096774194 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.875 0.90909091 0.93333333 0.93333333 0.68965517 0.83333333 0.60869565 0.77419355 0.83870968 0.83870968] mean value: 0.8234054636904422 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.875 0.88235294 0.93333333 0.93333333 0.71428571 0.71428571 1. 0.8 0.86666667 0.86666667] mean value: 0.8585924369747899 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.875 0.9375 0.93333333 0.93333333 0.66666667 1. 0.4375 0.75 0.8125 0.8125 ] mean value: 0.8158333333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.875 0.90625 0.93541667 0.93541667 0.70833333 0.8125 0.71875 0.775 0.83958333 0.83958333] mean value: 0.8345833333333333 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.77777778 0.83333333 0.875 0.875 0.52631579 0.71428571 0.4375 0.63157895 0.72222222 0.72222222] mean value: 0.7115236006683375 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.63 Accuracy on Blind test: 0.81 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.02469635 0.03099465 0.03220057 0.05974889 0.03210664 0.04455256 0.06403112 0.06385612 0.06735516 0.05560327] mean value: 0.04751453399658203 key: score_time value: [0.01184535 0.01187634 0.01188827 0.02012181 0.01192713 0.02099824 0.0206399 0.02027392 0.02046394 0.01186919] mean value: 0.016190409660339355 key: test_mcc value: [0.57265629 0.62994079 0.4365267 0.43041423 0.16878989 0.39198315 0.29960206 0.48333333 0.43041423 0.74896053] mean value: 0.45926212084224455 key: train_mcc value: [0.84294316 0.8653681 0.85764944 0.8507372 0.87902123 0.8718845 0.8718845 0.87919331 0.85071454 0.87921164] mean value: 0.8648607631443651 key: test_accuracy value: [0.78125 0.8125 0.70967742 0.70967742 0.58064516 0.67741935 0.64516129 0.74193548 0.70967742 0.87096774] mean value: 0.723891129032258 key: train_accuracy value: [0.92142857 0.93214286 0.92882562 0.9252669 0.93950178 0.93594306 0.93594306 0.93950178 0.9252669 0.93950178] mean value: 0.9323322318251144 key: test_fscore value: [0.75862069 0.82352941 0.64 0.72727273 0.60606061 0.72222222 0.62068966 0.75 0.68965517 0.86666667] mean value: 0.7204717151228307 key: train_fscore value: [0.92086331 0.93040293 0.92907801 0.92473118 0.93992933 0.93617021 0.93571429 0.93862816 0.92418773 0.93992933] mean value: 0.9319634476936138 key: test_precision value: [0.84615385 0.77777778 0.8 0.66666667 0.55555556 0.61904762 0.69230769 0.75 0.76923077 0.92857143] mean value: 0.7405311355311356 key: train_precision value: [0.92753623 0.95488722 0.92907801 0.93478261 0.93661972 0.93617021 0.93571429 0.94890511 0.93430657 0.93006993] mean value: 0.9368069898501369 key: test_recall value: [0.6875 0.875 0.53333333 0.8 0.66666667 0.86666667 0.5625 0.75 0.625 0.8125 ] mean value: 0.7179166666666666 key: train_recall value: [0.91428571 0.90714286 0.92907801 0.91489362 0.94326241 0.93617021 0.93571429 0.92857143 0.91428571 0.95 ] mean value: 0.9273404255319149 key: test_roc_auc value: [0.78125 0.8125 0.70416667 0.7125 0.58333333 0.68333333 0.64791667 0.74166667 0.7125 0.87291667] mean value: 0.7252083333333333 key: train_roc_auc value: [0.92142857 0.93214286 0.92882472 0.92530395 0.93948835 0.93594225 0.93594225 0.93946302 0.92522796 0.93953901] mean value: 0.9323302938196555 key: test_jcc value: [0.61111111 0.7 0.47058824 0.57142857 0.43478261 0.56521739 0.45 0.6 0.52631579 0.76470588] mean value: 0.5694149589660426 key: train_jcc value: [0.85333333 0.86986301 0.86754967 0.86 0.88666667 0.88 0.87919463 0.88435374 0.8590604 0.88666667] mean value: 0.8726688124293115 MCC on Blind test: 0.36 Accuracy on Blind test: 0.68 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01310349 0.00920939 0.00916791 0.00908709 0.00897574 0.00889897 0.0089469 0.00908518 0.00897837 0.0089066 ] mean value: 0.00943596363067627 key: score_time value: [0.0314641 0.00882173 0.00867224 0.00837922 0.00839758 0.0084486 0.0083878 0.00838399 0.00836039 0.00852084] mean value: 0.010783648490905762 key: test_mcc value: [ 0.46056619 0.59215653 0.35445878 0.5612264 -0.02227177 0.16878989 0.28870546 0.48954403 0.48333333 0.82285074] mean value: 0.419935957325999 key: train_mcc value: [0.43760498 0.45195269 0.49730798 0.46185564 0.48917077 0.48864808 0.47585629 0.46906706 0.45260942 0.44044429] mean value: 0.46645171945247466 key: test_accuracy value: [0.71875 0.78125 0.67741935 0.77419355 0.48387097 0.58064516 0.64516129 0.74193548 0.74193548 0.90322581] mean value: 0.7048387096774194 key: train_accuracy value: [0.71785714 0.725 0.74733096 0.72953737 0.74377224 0.74377224 0.7366548 0.73309609 0.72597865 0.71886121] mean value: 0.7321860701576004 key: test_fscore value: [0.75675676 0.81081081 0.64285714 0.78787879 0.55555556 0.60606061 0.66666667 0.73333333 0.75 0.89655172] mean value: 0.7206471384057591 key: train_fscore value: [0.73037543 0.73720137 0.76094276 0.74496644 0.75510204 0.75342466 0.74829932 0.74576271 0.73170732 0.73220339] mean value: 0.7439985432551205 key: test_precision value: [0.66666667 0.71428571 0.69230769 0.72222222 0.47619048 0.55555556 0.64705882 0.78571429 0.75 1. ] mean value: 0.7010001436472024 key: train_precision value: [0.69934641 0.70588235 0.72435897 0.70700637 0.7254902 0.72847682 0.71428571 0.70967742 0.71428571 0.69677419] mean value: 0.71255841607008 key: test_recall value: [0.875 0.9375 0.6 0.86666667 0.66666667 0.66666667 0.6875 0.6875 0.75 0.8125 ] mean value: 0.755 key: train_recall value: [0.76428571 0.77142857 0.80141844 0.78723404 0.78723404 0.78014184 0.78571429 0.78571429 0.75 0.77142857] mean value: 0.7784599797365754 key: test_roc_auc value: [0.71875 0.78125 0.675 0.77708333 0.48958333 0.58333333 0.64375 0.74375 0.74166667 0.90625 ] mean value: 0.7060416666666667 key: train_roc_auc value: [0.71785714 0.725 0.74713779 0.72933131 0.74361702 0.74364235 0.73682877 0.73328267 0.72606383 0.71904762] mean value: 0.7321808510638298 key: test_jcc value: [0.60869565 0.68181818 0.47368421 0.65 0.38461538 0.43478261 0.5 0.57894737 0.6 0.8125 ] mean value: 0.57250434062505 key: train_jcc value: [0.57526882 0.58378378 0.61413043 0.59358289 0.60655738 0.6043956 0.59782609 0.59459459 0.57692308 0.57754011] mean value: 0.5924602770342078 MCC on Blind test: 0.31 Accuracy on Blind test: 0.66 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01268196 0.01671743 0.01599097 0.01547456 0.01932931 0.01919198 0.01620913 0.01577759 0.01730418 0.01631546] mean value: 0.016499257087707518 key: score_time value: [0.00838947 0.01079845 0.01083779 0.0114007 0.01187229 0.01147294 0.01136541 0.011338 0.01137733 0.01140523] mean value: 0.011025762557983399 key: test_mcc value: [0.50395263 0.68884672 0.4184137 0.18856181 0.55 0.35416667 0.57104024 0.54812195 0.57461167 0.4770843 ] mean value: 0.48747996920218695 key: train_mcc value: [0.66800328 0.72514339 0.73132112 0.3380284 0.79461478 0.7291921 0.67477868 0.75091185 0.59031555 0.73396841] mean value: 0.673627756111491 key: test_accuracy value: [0.75 0.84375 0.70967742 0.5483871 0.77419355 0.67741935 0.74193548 0.77419355 0.77419355 0.70967742] mean value: 0.7303427419354839 key: train_accuracy value: [0.81785714 0.85357143 0.86476868 0.60142349 0.89679715 0.84697509 0.82562278 0.87544484 0.75800712 0.86120996] mean value: 0.8201677681748856 key: test_fscore value: [0.76470588 0.83870968 0.68965517 0.125 0.77419355 0.66666667 0.66666667 0.78787879 0.81081081 0.64 ] mean value: 0.6764287212596118 key: train_fscore value: [0.84210526 0.83534137 0.86986301 0.34117647 0.89454545 0.82008368 0.79835391 0.87544484 0.8045977 0.84705882] mean value: 0.792857052346194 key: test_precision value: [0.72222222 0.86666667 0.71428571 1. 0.75 0.66666667 1. 0.76470588 0.71428571 0.88888889] mean value: 0.8087721755368814 key: train_precision value: [0.7431694 0.95412844 0.8410596 1. 0.91791045 1. 0.94174757 0.87234043 0.67307692 0.93913043] mean value: 0.8882563245891257 key: test_recall value: [0.8125 0.8125 0.66666667 0.06666667 0.8 0.66666667 0.5 0.8125 0.9375 0.5 ] mean value: 0.6575 key: train_recall value: [0.97142857 0.74285714 0.90070922 0.20567376 0.87234043 0.69503546 0.69285714 0.87857143 1. 0.77142857] mean value: 0.7730901722391084 key: test_roc_auc value: [0.75 0.84375 0.70833333 0.53333333 0.775 0.67708333 0.75 0.77291667 0.76875 0.71666667] mean value: 0.7295833333333334 key: train_roc_auc value: [0.81785714 0.85357143 0.86464032 0.60283688 0.8968845 0.84751773 0.82515198 0.87545593 0.75886525 0.86089159] mean value: 0.8203672745694023 key: test_jcc value: [0.61904762 0.72222222 0.52631579 0.06666667 0.63157895 0.5 0.5 0.65 0.68181818 0.47058824] mean value: 0.5368237661890912 key: train_jcc value: [0.72727273 0.71724138 0.76969697 0.20567376 0.80921053 0.69503546 0.66438356 0.77848101 0.67307692 0.73469388] mean value: 0.6774766197383995 MCC on Blind test: 0.33 Accuracy on Blind test: 0.66 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01721334 0.01558971 0.0196166 0.01636291 0.01849246 0.01584291 0.01977682 0.01872659 0.01812482 0.01565337] mean value: 0.01753995418548584 key: score_time value: [0.01143718 0.01138806 0.0113678 0.01132464 0.01139116 0.01137233 0.01146889 0.01145911 0.01139402 0.01143646] mean value: 0.011403965950012206 key: test_mcc value: [0.48038446 0.59215653 0.37191715 0.63696156 0.54812195 0.42460389 0.42352151 0.53006813 0.53006813 0.53006813] mean value: 0.5067871441548388 key: train_mcc value: [0.46056619 0.75017225 0.64420772 0.7197701 0.70326066 0.67363597 0.72865591 0.71629725 0.75766296 0.73396841] mean value: 0.6888197434733603 key: test_accuracy value: [0.6875 0.78125 0.67741935 0.80645161 0.77419355 0.64516129 0.67741935 0.74193548 0.74193548 0.74193548] mean value: 0.7275201612903226 key: train_accuracy value: [0.675 0.875 0.79359431 0.85053381 0.84341637 0.81494662 0.84697509 0.84697509 0.8683274 0.86120996] mean value: 0.8275978647686832 key: test_fscore value: [0.76190476 0.81081081 0.70588235 0.82352941 0.75862069 0.73170732 0.58333333 0.69230769 0.69230769 0.69230769] mean value: 0.7252711754406208 key: train_fscore value: [0.75471698 0.87364621 0.82941176 0.86624204 0.82539683 0.84337349 0.8185654 0.8244898 0.85020243 0.84705882] mean value: 0.8333103762254988 key: test_precision value: [0.61538462 0.71428571 0.63157895 0.73684211 0.78571429 0.57692308 0.875 0.9 0.9 0.9 ] mean value: 0.7635728744939271 key: train_precision value: [0.60606061 0.88321168 0.70854271 0.78612717 0.93693694 0.73298429 1. 0.96190476 0.98130841 0.93913043] mean value: 0.8536207004123598 key: test_recall value: [1. 0.9375 0.8 0.93333333 0.73333333 1. 0.4375 0.5625 0.5625 0.5625 ] mean value: 0.7529166666666667 key: train_recall value: [1. 0.86428571 1. 0.96453901 0.73758865 0.9929078 0.69285714 0.72142857 0.75 0.77142857] mean value: 0.8495035460992908 key: test_roc_auc value: [0.6875 0.78125 0.68125 0.81041667 0.77291667 0.65625 0.68541667 0.74791667 0.74791667 0.74791667] mean value: 0.731875 key: train_roc_auc value: [0.675 0.875 0.79285714 0.85012665 0.84379433 0.81431104 0.84642857 0.84652989 0.8679078 0.86089159] mean value: 0.8272847011144884 key: test_jcc value: [0.61538462 0.68181818 0.54545455 0.7 0.61111111 0.57692308 0.41176471 0.52941176 0.52941176 0.52941176] mean value: 0.5730691530691531 key: train_jcc value: [0.60606061 0.77564103 0.70854271 0.76404494 0.7027027 0.72916667 0.69285714 0.70138889 0.73943662 0.73469388] mean value: 0.7154535187474427 MCC on Blind test: 0.41 Accuracy on Blind test: 0.68 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.14774513 0.12925386 0.12936115 0.13003302 0.13002491 0.13006401 0.13567472 0.1359446 0.13547182 0.14191341] mean value: 0.13454866409301758 key: score_time value: [0.01482654 0.01483226 0.01495528 0.01483512 0.01467562 0.01515245 0.02102137 0.01637673 0.01544118 0.01475692] mean value: 0.01568734645843506 key: test_mcc value: [0.75592895 0.625 0.55 0.6310315 0.6125 0.5612264 0.33487648 0.69203857 0.48954403 0.63696156] mean value: 0.588910748855813 key: train_mcc value: [0.97859639 0.97859639 0.9929078 0.97162977 0.9929078 0.9929078 0.99290744 0.97867167 0.9929078 1. ] mean value: 0.9872032871447398 key: test_accuracy value: [0.875 0.8125 0.77419355 0.80645161 0.80645161 0.77419355 0.64516129 0.83870968 0.74193548 0.80645161] mean value: 0.7881048387096774 key: train_accuracy value: [0.98928571 0.98928571 0.99644128 0.98576512 0.99644128 0.99644128 0.99644128 0.98932384 0.99644128 1. ] mean value: 0.9935866802236909 key: test_fscore value: [0.88235294 0.8125 0.77419355 0.76923077 0.8 0.78787879 0.56 0.82758621 0.73333333 0.78571429] mean value: 0.7732789872617295 key: train_fscore value: [0.98932384 0.98932384 0.99644128 0.98571429 0.99644128 0.99644128 0.99641577 0.98924731 0.99644128 1. ] mean value: 0.9935790179539462 key: test_precision value: [0.83333333 0.8125 0.75 0.90909091 0.8 0.72222222 0.77777778 0.92307692 0.78571429 0.91666667] mean value: 0.8230382117882118 key: train_precision value: [0.9858156 0.9858156 1. 0.99280576 1. 1. 1. 0.99280576 0.9929078 1. ] mean value: 0.9950150517883566 key: test_recall value: [0.9375 0.8125 0.8 0.66666667 0.8 0.86666667 0.4375 0.75 0.6875 0.6875 ] mean value: 0.7445833333333334 key: train_recall value: [0.99285714 0.99285714 0.9929078 0.9787234 0.9929078 0.9929078 0.99285714 0.98571429 1. 1. ] mean value: 0.9921732522796353 key: test_roc_auc value: [0.875 0.8125 0.775 0.80208333 0.80625 0.77708333 0.65208333 0.84166667 0.74375 0.81041667] mean value: 0.7895833333333333 key: train_roc_auc value: [0.98928571 0.98928571 0.9964539 0.98579027 0.9964539 0.9964539 0.99642857 0.98931104 0.9964539 1. ] mean value: 0.9935916919959473 key: test_jcc value: [0.78947368 0.68421053 0.63157895 0.625 0.66666667 0.65 0.38888889 0.70588235 0.57894737 0.64705882] mean value: 0.6367707258341934 key: train_jcc value: [0.97887324 0.97887324 0.9929078 0.97183099 0.9929078 0.9929078 0.99285714 0.9787234 0.9929078 1. ] mean value: 0.9872789217574953 MCC on Blind test: 0.41 Accuracy on Blind test: 0.71 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.05547452 0.05822182 0.05548573 0.05391788 0.05418348 0.04964733 0.05522466 0.07143283 0.07606363 0.05446839] mean value: 0.058412027359008786 key: score_time value: [0.02311087 0.02655649 0.01827073 0.01858497 0.02204895 0.02504039 0.03309536 0.02989912 0.03761315 0.02283883] mean value: 0.025705885887145997 key: test_mcc value: [0.75592895 0.75592895 0.6778302 0.80753845 0.48527095 0.6681531 0.42352151 0.55 0.69203857 0.61925228] mean value: 0.6435462955940828 key: train_mcc value: [0.97182532 0.93688613 0.97192667 0.94460323 0.97162977 0.97867167 0.96501929 0.98586412 0.978869 0.95816272] mean value: 0.9663457912208626 key: test_accuracy value: [0.875 0.875 0.83870968 0.90322581 0.74193548 0.80645161 0.67741935 0.77419355 0.83870968 0.80645161] mean value: 0.8137096774193548 key: train_accuracy value: [0.98571429 0.96785714 0.98576512 0.97153025 0.98576512 0.98932384 0.98220641 0.99288256 0.98932384 0.97864769] mean value: 0.982901626842908 key: test_fscore value: [0.86666667 0.86666667 0.82758621 0.89655172 0.71428571 0.83333333 0.58333333 0.77419355 0.82758621 0.8 ] mean value: 0.7990203400603846 key: train_fscore value: [0.98550725 0.96703297 0.98561151 0.97080292 0.98571429 0.98939929 0.98181818 0.99280576 0.98916968 0.97810219] mean value: 0.982596402499482 key: test_precision value: [0.92857143 0.92857143 0.85714286 0.92857143 0.76923077 0.71428571 0.875 0.8 0.92307692 0.85714286] mean value: 0.8581593406593406 key: train_precision value: [1. 0.9924812 1. 1. 0.99280576 0.98591549 1. 1. 1. 1. ] mean value: 0.9971202451360949 key: test_recall value: [0.8125 0.8125 0.8 0.86666667 0.66666667 1. 0.4375 0.75 0.75 0.75 ] mean value: 0.7645833333333334 key: train_recall value: [0.97142857 0.94285714 0.97163121 0.94326241 0.9787234 0.9929078 0.96428571 0.98571429 0.97857143 0.95714286] mean value: 0.9686524822695035 key: test_roc_auc value: [0.875 0.875 0.8375 0.90208333 0.73958333 0.8125 0.68541667 0.775 0.84166667 0.80833333] mean value: 0.8152083333333333 key: train_roc_auc value: [0.98571429 0.96785714 0.9858156 0.97163121 0.98579027 0.98931104 0.98214286 0.99285714 0.98928571 0.97857143] mean value: 0.9828976697061804 key: test_jcc value: [0.76470588 0.76470588 0.70588235 0.8125 0.55555556 0.71428571 0.41176471 0.63157895 0.70588235 0.66666667] mean value: 0.6733528060346946 key: train_jcc value: [0.97142857 0.93617021 0.97163121 0.94326241 0.97183099 0.97902098 0.96428571 0.98571429 0.97857143 0.95714286] mean value: 0.9659058651866563 MCC on Blind test: 0.49 Accuracy on Blind test: 0.74 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.07153654 0.08550692 0.08657861 0.0378139 0.0400362 0.10264516 0.09579253 0.07016683 0.03786349 0.03786325] mean value: 0.06658034324645996 key: score_time value: [0.02121544 0.02933049 0.01310635 0.0131979 0.01302719 0.02128839 0.02968192 0.01325464 0.01323366 0.01954889] mean value: 0.018688488006591796 key: test_mcc value: [0.56360186 0.31311215 0.48333333 0.5612264 0.39198315 0.43041423 0.48954403 0.29069387 0.61925228 0.57104024] mean value: 0.4714201549672476 key: train_mcc value: [0.98571429 0.98571429 0.99290744 0.98576494 0.98576494 0.9929078 0.98576494 0.98576494 0.98576494 0.98576494] mean value: 0.9871833481894341 key: test_accuracy value: [0.78125 0.65625 0.74193548 0.77419355 0.67741935 0.70967742 0.74193548 0.64516129 0.80645161 0.74193548] mean value: 0.7276209677419355 key: train_accuracy value: [0.99285714 0.99285714 0.99644128 0.99288256 0.99288256 0.99644128 0.99288256 0.99288256 0.99288256 0.99288256] mean value: 0.9935892221657346 key: test_fscore value: [0.78787879 0.66666667 0.73333333 0.78787879 0.72222222 0.72727273 0.73333333 0.68571429 0.8 0.66666667] mean value: 0.731096681096681 key: train_fscore value: [0.99285714 0.99285714 0.99646643 0.9929078 0.9929078 0.99644128 0.99285714 0.99285714 0.99285714 0.99285714] mean value: 0.9935866172213933 key: test_precision value: [0.76470588 0.64705882 0.73333333 0.72222222 0.61904762 0.66666667 0.78571429 0.63157895 0.85714286 1. ] mean value: 0.7427470637377758 key: train_precision value: [0.99285714 0.99285714 0.99295775 0.9929078 0.9929078 1. 0.99285714 0.99285714 0.99285714 0.99285714] mean value: 0.993591620645861 key: test_recall value: [0.8125 0.6875 0.73333333 0.86666667 0.86666667 0.8 0.6875 0.75 0.75 0.5 ] mean value: 0.7454166666666666 key: train_recall value: [0.99285714 0.99285714 1. 0.9929078 0.9929078 0.9929078 0.99285714 0.99285714 0.99285714 0.99285714] mean value: 0.9935866261398176 key: test_roc_auc value: [0.78125 0.65625 0.74166667 0.77708333 0.68333333 0.7125 0.74375 0.64166667 0.80833333 0.75 ] mean value: 0.7295833333333334 key: train_roc_auc value: [0.99285714 0.99285714 0.99642857 0.99288247 0.99288247 0.9964539 0.99288247 0.99288247 0.99288247 0.99288247] mean value: 0.9935891590678825 key: test_jcc value: [0.65 0.5 0.57894737 0.65 0.56521739 0.57142857 0.57894737 0.52173913 0.66666667 0.5 ] mean value: 0.5782946496676473 key: train_jcc value: [0.9858156 0.9858156 0.99295775 0.98591549 0.98591549 0.9929078 0.9858156 0.9858156 0.9858156 0.9858156 ] mean value: 0.9872590150834082 MCC on Blind test: 0.27 Accuracy on Blind test: 0.64 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.48413157 0.47434545 0.47634721 0.47761345 0.47614884 0.47844601 0.47672391 0.47568178 0.47936511 0.48416018] mean value: 0.4782963514328003 key: score_time value: [0.00920129 0.00912952 0.0093317 0.0096333 0.00975752 0.00908947 0.00899506 0.00906134 0.00970078 0.00986767] mean value: 0.009376764297485352 key: test_mcc value: [0.62994079 0.875 0.74896053 0.80753845 0.4184137 0.69203857 0.4770843 0.61925228 0.69203857 0.63696156] mean value: 0.6597228748980087 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8125 0.9375 0.87096774 0.90322581 0.70967742 0.83870968 0.70967742 0.80645161 0.83870968 0.80645161] mean value: 0.8233870967741935 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8 0.9375 0.875 0.89655172 0.68965517 0.84848485 0.64 0.8 0.82758621 0.78571429] mean value: 0.810049223764741 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.85714286 0.9375 0.82352941 0.92857143 0.71428571 0.77777778 0.88888889 0.85714286 0.92307692 0.91666667] mean value: 0.8624582525317819 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 0.9375 0.93333333 0.86666667 0.66666667 0.93333333 0.5 0.75 0.75 0.6875 ] mean value: 0.7775 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8125 0.9375 0.87291667 0.90208333 0.70833333 0.84166667 0.71666667 0.80833333 0.84166667 0.81041667] mean value: 0.8252083333333333 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.66666667 0.88235294 0.77777778 0.8125 0.52631579 0.73684211 0.47058824 0.66666667 0.70588235 0.64705882] mean value: 0.6892651358789129 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.6 Accuracy on Blind test: 0.8 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.023242 0.02464414 0.03425264 0.02344084 0.02590322 0.02454281 0.02392101 0.02409291 0.02381301 0.02434063] mean value: 0.025219321250915527 key: score_time value: [0.01257801 0.01315999 0.01226115 0.01292157 0.01545835 0.01711035 0.01511407 0.01517534 0.01511908 0.01794767] mean value: 0.014684557914733887 key: test_mcc value: [0.53935989 0.62994079 0.22364661 0.50443936 0.05046084 0.372678 0.43041423 0.29069387 0.29166667 0.67916667] mean value: 0.4012466911245091 key: train_mcc value: [0.98581488 0.99288247 0.91791995 0.98576494 0.95816272 1. 0.95078573 0.99290744 0.978869 0.92456546] mean value: 0.9687672601676953 key: test_accuracy value: [0.75 0.8125 0.61290323 0.74193548 0.51612903 0.61290323 0.70967742 0.64516129 0.64516129 0.83870968] mean value: 0.688508064516129 key: train_accuracy value: [0.99285714 0.99642857 0.95729537 0.99288256 0.97864769 1. 0.97508897 0.99644128 0.98932384 0.96085409] mean value: 0.9839819522114895 key: test_fscore value: [0.78947368 0.82352941 0.57142857 0.76470588 0.59459459 0.71428571 0.68965517 0.68571429 0.64516129 0.83870968] mean value: 0.7117258284507069 key: train_fscore value: [0.9929078 0.99641577 0.95918367 0.9929078 0.97916667 1. 0.9754386 0.99641577 0.98916968 0.96219931] mean value: 0.984380506848783 key: test_precision value: [0.68181818 0.77777778 0.61538462 0.68421053 0.5 0.55555556 0.76923077 0.63157895 0.66666667 0.86666667] mean value: 0.6748889706784443 key: train_precision value: [0.98591549 1. 0.92156863 0.9929078 0.95918367 1. 0.95862069 1. 1. 0.92715232] mean value: 0.9745348602832521 key: test_recall value: [0.9375 0.875 0.53333333 0.86666667 0.73333333 1. 0.625 0.75 0.625 0.8125 ] mean value: 0.7758333333333334 key: train_recall value: [1. 0.99285714 1. 0.9929078 1. 1. 0.99285714 0.99285714 0.97857143 1. ] mean value: 0.9950050658561297 key: test_roc_auc value: [0.75 0.8125 0.61041667 0.74583333 0.52291667 0.625 0.7125 0.64166667 0.64583333 0.83958333] mean value: 0.690625 key: train_roc_auc value: [0.99285714 0.99642857 0.95714286 0.99288247 0.97857143 1. 0.97515198 0.99642857 0.98928571 0.96099291] mean value: 0.9839741641337386 key: test_jcc value: [0.65217391 0.7 0.4 0.61904762 0.42307692 0.55555556 0.52631579 0.52173913 0.47619048 0.72222222] mean value: 0.5596321629044742 key: train_jcc value: [0.98591549 0.99285714 0.92156863 0.98591549 0.95918367 1. 0.95205479 0.99285714 0.97857143 0.92715232] mean value: 0.9696076113522918 MCC on Blind test: 0.13 Accuracy on Blind test: 0.58 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02307701 0.03570557 0.0143652 0.01603484 0.01425147 0.01433063 0.03359962 0.03737807 0.0342834 0.0371778 ] mean value: 0.026020359992980958 key: score_time value: [0.02180982 0.01184416 0.0117712 0.01172042 0.01180339 0.01192832 0.02126312 0.02331209 0.0165627 0.02375984] mean value: 0.016577506065368654 key: test_mcc value: [0.62994079 0.72374686 0.4184137 0.61925228 0.16878989 0.6681531 0.37191715 0.48333333 0.6125 0.5612264 ] mean value: 0.5257273524179626 key: train_mcc value: [0.79303924 0.81503456 0.81494428 0.80802555 0.8363139 0.81630561 0.83730807 0.80101379 0.77982279 0.82207812] mean value: 0.8123885891717825 key: test_accuracy value: [0.8125 0.84375 0.70967742 0.80645161 0.58064516 0.80645161 0.67741935 0.74193548 0.80645161 0.77419355] mean value: 0.7559475806451613 key: train_accuracy value: [0.89642857 0.90714286 0.90747331 0.90391459 0.91814947 0.90747331 0.91814947 0.90035587 0.88967972 0.91103203] mean value: 0.9059799186578545 key: test_fscore value: [0.8 0.86486486 0.68965517 0.8125 0.60606061 0.83333333 0.64285714 0.75 0.8125 0.75862069] mean value: 0.7570391809184912 key: train_fscore value: [0.89530686 0.90510949 0.90780142 0.90322581 0.91872792 0.90510949 0.91575092 0.89855072 0.88727273 0.91039427]/home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:196: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_7030.py:199: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) mean value: 0.9047249610287941 key: test_precision value: [0.85714286 0.76190476 0.71428571 0.76470588 0.55555556 0.71428571 0.75 0.75 0.8125 0.84615385] mean value: 0.752653433168139 key: train_precision value: [0.90510949 0.92537313 0.90780142 0.91304348 0.91549296 0.93233083 0.93984962 0.91176471 0.9037037 0.91366906] mean value: 0.9168138403288595 key: test_recall value: [0.75 1. 0.66666667 0.86666667 0.66666667 1. 0.5625 0.75 0.8125 0.6875 ] mean value: 0.77625 key: train_recall value: [0.88571429 0.88571429 0.90780142 0.89361702 0.92198582 0.87943262 0.89285714 0.88571429 0.87142857 0.90714286] mean value: 0.8931408308004053 key: test_roc_auc value: [0.8125 0.84375 0.70833333 0.80833333 0.58333333 0.8125 0.68125 0.74166667 0.80625 0.77708333] mean value: 0.7575000000000001 key: train_roc_auc value: [0.89642857 0.90714286 0.90747214 0.90395137 0.91813576 0.90757345 0.91805978 0.90030395 0.88961499 0.91101824] mean value: 0.9059701114488349 key: test_jcc value: [0.66666667 0.76190476 0.52631579 0.68421053 0.43478261 0.71428571 0.47368421 0.6 0.68421053 0.61111111] mean value: 0.6157171915295485 key: train_jcc value: [0.81045752 0.82666667 0.83116883 0.82352941 0.8496732 0.82666667 0.84459459 0.81578947 0.79738562 0.83552632] mean value: 0.8261458300204431 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.25024295 0.12240052 0.26541185 0.16076803 0.26370955 0.23450518 0.24887896 0.24265242 0.2682631 0.25118542] mean value: 0.23080179691314698 key: score_time value: [0.01692557 0.01229692 0.02219248 0.01252437 0.02168393 0.0223794 0.02114606 0.0221355 0.02409887 0.02430439] mean value: 0.019968748092651367 key: test_mcc value: [0.56360186 0.64549722 0.42321607 0.55 0.55 0.4770843 0.23012754 0.48527095 0.61925228 0.58316015] mean value: 0.5127210368867323 key: train_mcc value: [0.65041494 0.65014929 0.67315825 0.67267845 0.66585571 0.69395613 0.66565335 0.68713898 0.66594028 0.67989057] mean value: 0.6704835938327732 key: test_accuracy value: [0.78125 0.8125 0.70967742 0.77419355 0.77419355 0.70967742 0.61290323 0.74193548 0.80645161 0.77419355] mean value: 0.7496975806451613 key: train_accuracy value: [0.825 0.825 0.83629893 0.83629893 0.83274021 0.84697509 0.83274021 0.84341637 0.83274021 0.83985765] mean value: 0.8351067615658363 key: test_fscore value: [0.77419355 0.83333333 0.66666667 0.77419355 0.77419355 0.75675676 0.6 0.76470588 0.8 0.74074074] mean value: 0.7484784025011729 key: train_fscore value: [0.82807018 0.82685512 0.84027778 0.83571429 0.83623693 0.84805654 0.83392226 0.84507042 0.83508772 0.8409894 ] mean value: 0.8370280636116797 key: test_precision value: [0.8 0.75 0.75 0.75 0.75 0.63636364 0.64285714 0.72222222 0.85714286 0.90909091] mean value: 0.7567676767676768 key: train_precision value: [0.8137931 0.81818182 0.82312925 0.84172662 0.82191781 0.84507042 0.82517483 0.83333333 0.82068966 0.83216783] mean value: 0.8275184668638604 key: test_recall value: [0.75 0.9375 0.6 0.8 0.8 0.93333333 0.5625 0.8125 0.75 0.625 ] mean value: 0.7570833333333333 key: train_recall value: [0.84285714 0.83571429 0.85815603 0.82978723 0.85106383 0.85106383 0.84285714 0.85714286 0.85 0.85 ] mean value: 0.8468642350557244 key: test_roc_auc value: [0.78125 0.8125 0.70625 0.775 0.775 0.71666667 0.61458333 0.73958333 0.80833333 0.77916667] mean value: 0.7508333333333334 key: train_roc_auc value: [0.825 0.825 0.83622087 0.83632219 0.83267477 0.84696049 0.83277609 0.84346505 0.83280142 0.83989362] mean value: 0.8351114488348531 key: test_jcc value: [0.63157895 0.71428571 0.5 0.63157895 0.63157895 0.60869565 0.42857143 0.61904762 0.66666667 0.58823529] mean value: 0.6020239216968252 key: train_jcc value: [0.70658683 0.70481928 0.7245509 0.71779141 0.71856287 0.73619632 0.71515152 0.73170732 0.71686747 0.72560976] mean value: 0.7197843664173944 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7