/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_cd_sl.py:548: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 424 PASS: my_features_df and aa_df successfully combined nrows: 424 ncols: 265 count of NULL values before imputation or_mychisq 102 log10_or_mychisq 102 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML Total no. of features for aaindex: 123 No. of numerical features: 166 No. of categorical features: 7 PASS: x_features has no target variable No. of columns for x_features: 173 ------------------------------------------------------------- Successfully split data with stratification according to scaling law [COMPLETE data]: 1/sqrt(x_ncols) Input features data size: (424, 173) Train data size: (391, 173) Test data size: (33, 173) y_train numbers: Counter({1: 215, 0: 176}) y_train ratio: 0.8186046511627907 y_test_numbers: Counter({1: 18, 0: 15}) y_test ratio: 0.8333333333333334 ------------------------------------------------------------- index: 0 ind: 1 Mask count check: True Original Data Counter({1: 215, 0: 176}) Data dim: (391, 173) Simple Random OverSampling Counter({1: 215, 0: 215}) (430, 173) Simple Random UnderSampling Counter({0: 176, 1: 176}) (352, 173) Simple Combined Over and UnderSampling Counter({0: 215, 1: 215}) (430, 173) SMOTE_NC OverSampling Counter({1: 215, 0: 215}) (430, 173) ##################################################################### Running ML analysis [COMPLETE DATA]: 70/30 split Gene name: pncA Drug name: pyrazinamide Output directory: /home/tanu/git/Data/pyrazinamide/output/ml/tts_cd_sl/ Sanity checks: Total input features: 173 Training data size: (391, 173) Test data size: (33, 173) Target feature numbers (training data): Counter({1: 215, 0: 176}) Target features ratio (training data: 0.8186046511627907 Target feature numbers (test data): Counter({1: 18, 0: 15}) Target features ratio (test data): 0.8333333333333334 ##################################################################### ================================================================ Strucutral features (n): 34 These are: Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts'] FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss'] Other struc columns: ['rsa', 'kd_values', 'rd_values'] ================================================================ AAindex features (n): 123 These are: ['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'] ================================================================ Evolutionary features (n): 3 These are: ['consurf_score', 'snap2_score', 'provean_score'] ================================================================ Genomic features (n): 6 These are: ['maf', 'logorI'] ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'] ================================================================ Categorical features (n): 7 These are: ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'] ================================================================ Pass: No. of features match ##################################################################### Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.06880522 0.06128597 0.03583455 0.03662205 0.03685975 0.03469419 0.07428098 0.07097745 0.07788467 0.0719893 ] mean value: 0.056923413276672365 key: score_time value: [0.02212453 0.0147028 0.01471114 0.01479864 0.01835752 0.01722169 0.01488853 0.02007484 0.01981497 0.01230764] mean value: 0.016900229454040527 key: test_mcc value: [0.38978972 0.31232635 0.37433155 0.74203177 0.63570849 0.28496141 0.59384599 0.54870326 0.58730159 0.59366961] mean value: 0.5062669750285254 key: train_mcc value: [0.72322874 0.67828767 0.71860784 0.68388464 0.72998334 0.71807179 0.70142242 0.72384814 0.72961234 0.68920955] mean value: 0.7096156476090553 key: test_accuracy value: [0.7 0.66666667 0.69230769 0.87179487 0.82051282 0.64102564 0.79487179 0.74358974 0.79487179 0.79487179] mean value: 0.752051282051282 key: train_accuracy value: [0.86324786 0.84090909 0.86079545 0.84375 0.86647727 0.86079545 0.85227273 0.86363636 0.86647727 0.84659091] mean value: 0.8564952408702409 key: test_fscore value: [0.73913043 0.72340426 0.72727273 0.88372093 0.85106383 0.65 0.82608696 0.80769231 0.80952381 0.8 ] mean value: 0.7817895251132134 key: train_fscore value: [0.87755102 0.85641026 0.87403599 0.86005089 0.88040712 0.87594937 0.86597938 0.87817259 0.88101266 0.86363636] mean value: 0.8713205641031424 key: test_precision value: [0.70833333 0.68 0.72727273 0.9047619 0.8 0.68421053 0.76 0.67741935 0.80952381 0.84210526] mean value: 0.7593626919204168 key: train_precision value: [0.86432161 0.84771574 0.86734694 0.845 0.865 0.86069652 0.86597938 0.865 0.86567164 0.84653465] mean value: 0.8593266476968946 key: test_recall value: [0.77272727 0.77272727 0.72727273 0.86363636 0.90909091 0.61904762 0.9047619 1. 0.80952381 0.76190476] mean value: 0.814069264069264 key: train_recall value: [0.89119171 0.86528497 0.88082902 0.87564767 0.89637306 0.89175258 0.86597938 0.89175258 0.89690722 0.8814433 ] mean value: 0.8837161476416858 key: test_roc_auc value: [0.69191919 0.65106952 0.68716578 0.87299465 0.80748663 0.64285714 0.78571429 0.72222222 0.79365079 0.79761905] mean value: 0.7452699261522792 key: train_roc_auc value: [0.86015282 0.83830286 0.8586535 0.84033956 0.86328087 0.85726869 0.85071121 0.86043325 0.86301057 0.84262038] mean value: 0.8534773716474491 key: test_jcc value: [0.5862069 0.56666667 0.57142857 0.79166667 0.74074074 0.48148148 0.7037037 0.67741935 0.68 0.66666667] mean value: 0.6465980748744931 key: train_jcc value: [0.78181818 0.74887892 0.77625571 0.75446429 0.78636364 0.77927928 0.76363636 0.78280543 0.78733032 0.76 ] mean value: 0.7720832124947455 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.88919163 1.10667205 1.70803356 1.91856122 1.52863455 1.64006972 1.92494702 1.51954269 1.73066807 1.40030789] mean value: 1.5366628408432006 key: score_time value: [0.01226068 0.02890587 0.02300549 0.02526402 0.02500796 0.01665282 0.02039099 0.01842761 0.02366877 0.01797628] mean value: 0.021156048774719237 key: test_mcc value: [0.38978972 0.42228828 0.42319443 0.5828877 0.63344389 0.28496141 0.64246755 0.51647727 0.53458203 0.64116318] mean value: 0.5071255464360515 key: train_mcc value: [0.67698193 0.66080987 0.6493094 0.71845866 0.77038423 0.7584744 0.75860776 0.78182499 0.66611594 0.72386034] mean value: 0.7164827518671235 key: test_accuracy value: [0.7 0.71794872 0.71794872 0.79487179 0.82051282 0.64102564 0.82051282 0.74358974 0.76923077 0.82051282] mean value: 0.7546153846153846 key: train_accuracy value: [0.84045584 0.83238636 0.82670455 0.86079545 0.88636364 0.88068182 0.88068182 0.89204545 0.83522727 0.86363636] mean value: 0.8598978567728568 key: test_fscore value: [0.73913043 0.7755102 0.75555556 0.81818182 0.84444444 0.65 0.84444444 0.8 0.79069767 0.82926829] mean value: 0.7847232868592036 key: train_fscore value: [0.85858586 0.85063291 0.84634761 0.87531807 0.89847716 0.89285714 0.89230769 0.90452261 0.85427136 0.87878788] mean value: 0.8752108284351288 key: test_precision value: [0.70833333 0.7037037 0.73913043 0.81818182 0.82608696 0.68421053 0.79166667 0.68965517 0.77272727 0.85 ] mean value: 0.7583695884646725 key: train_precision value: [0.83743842 0.83168317 0.82352941 0.86 0.88059701 0.88383838 0.8877551 0.88235294 0.83333333 0.86138614] mean value: 0.8581913917655096 key: test_recall value: [0.77272727 0.86363636 0.77272727 0.81818182 0.86363636 0.61904762 0.9047619 0.95238095 0.80952381 0.80952381] mean value: 0.8186147186147186 key: train_recall value: [0.88082902 0.87046632 0.87046632 0.89119171 0.91709845 0.90206186 0.89690722 0.92783505 0.87628866 0.89690722] mean value: 0.8930051813471502 key: test_roc_auc value: [0.69191919 0.69652406 0.70989305 0.79144385 0.81417112 0.64285714 0.81349206 0.72619048 0.76587302 0.82142857] mean value: 0.7473792547321959 key: train_roc_auc value: [0.83598413 0.82831492 0.82202561 0.85754554 0.88307752 0.87824612 0.87883336 0.88796816 0.83054939 0.85984601] mean value: 0.856239076622146 key: test_jcc value: [0.5862069 0.63333333 0.60714286 0.69230769 0.73076923 0.48148148 0.73076923 0.66666667 0.65384615 0.70833333] mean value: 0.6490856876201704 key: train_jcc value: [0.75221239 0.74008811 0.73362445 0.77828054 0.8156682 0.80645161 0.80555556 0.82568807 0.74561404 0.78378378] mean value: 0.7786966755732057 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.02437758 0.01368475 0.01371503 0.01373553 0.01398277 0.01400352 0.01363087 0.01404715 0.01381183 0.01371145] mean value: 0.014870047569274902 key: score_time value: [0.01235843 0.01252961 0.0123415 0.01239944 0.0125196 0.01226878 0.01242042 0.01238608 0.01226616 0.01235604] mean value: 0.012384605407714844 key: test_mcc value: [0.23466316 0.03656362 0.42012039 0.26162798 0.58501794 0.43085716 0.54870326 0.3474523 0.38575837 0.27777778] mean value: 0.35285419652273936 key: train_mcc value: [0.39819172 0.40075709 0.37914257 0.39405557 0.37165 0.41714552 0.35722079 0.40713425 0.38759372 0.38851052] mean value: 0.39014017351285274 key: test_accuracy value: [0.625 0.53846154 0.71794872 0.64102564 0.79487179 0.71794872 0.74358974 0.66666667 0.69230769 0.64102564] mean value: 0.6778846153846154 key: train_accuracy value: [0.7037037 0.70454545 0.69318182 0.69602273 0.69034091 0.71306818 0.68465909 0.70738636 0.69886364 0.69886364] mean value: 0.6990635521885522 key: test_fscore value: [0.71698113 0.625 0.76595745 0.69565217 0.83333333 0.74418605 0.80769231 0.74509804 0.75 0.66666667] mean value: 0.7350567146216648 key: train_fscore value: [0.75471698 0.75471698 0.75115207 0.76274945 0.74592075 0.76235294 0.74004684 0.76212471 0.75233645 0.75462963] mean value: 0.7540746796722013 key: test_precision value: [0.61290323 0.57692308 0.72 0.66666667 0.76923077 0.72727273 0.67741935 0.63333333 0.66666667 0.66666667] mean value: 0.6717082487405068 key: train_precision value: [0.69264069 0.69264069 0.67634855 0.66666667 0.6779661 0.7012987 0.67811159 0.69037657 0.68803419 0.68487395] mean value: 0.684895769729402 key: test_recall value: [0.86363636 0.68181818 0.81818182 0.72727273 0.90909091 0.76190476 1. 0.9047619 0.85714286 0.66666667] mean value: 0.819047619047619 key: train_recall value: [0.82901554 0.82901554 0.84455959 0.89119171 0.82901554 0.83505155 0.81443299 0.85051546 0.82989691 0.84020619] mean value: 0.8392901020244645 key: test_roc_auc value: [0.59848485 0.51737968 0.70320856 0.62834225 0.77807487 0.71428571 0.72222222 0.6468254 0.67857143 0.63888889] mean value: 0.6626283846872082 key: train_roc_auc value: [0.68982423 0.69123733 0.67699677 0.6751556 0.67551406 0.69917134 0.66987472 0.69108052 0.6839358 0.68276132] mean value: 0.6835551696333612 key: test_jcc value: [0.55882353 0.45454545 0.62068966 0.53333333 0.71428571 0.59259259 0.67741935 0.59375 0.6 0.5 ] mean value: 0.5845439634179983 key: train_jcc value: [0.60606061 0.60606061 0.60147601 0.61648746 0.59479554 0.61596958 0.58736059 0.61567164 0.60299625 0.60594796] mean value: 0.6052826249519565 MCC on Blind test: 0.32 Accuracy on Blind test: 0.67 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01415777 0.01403832 0.01400423 0.01419544 0.0141027 0.01410246 0.01409817 0.01396275 0.01395226 0.01405287] mean value: 0.014066696166992188 key: score_time value: [0.01240993 0.01239276 0.01233053 0.01219869 0.01227164 0.01228476 0.01232052 0.01230741 0.01235628 0.01227283] mean value: 0.012314534187316895 key: test_mcc value: [0.12974982 0.06149733 0.28117601 0.44298485 0.5828877 0.43535772 0.64246755 0.37940161 0.53826045 0.27348302] mean value: 0.37672660455215623 key: train_mcc value: [0.49959039 0.53088207 0.49047512 0.48722902 0.49476397 0.47780376 0.46632656 0.47174736 0.44408311 0.50475435] mean value: 0.4867655710273362 key: test_accuracy value: [0.575 0.53846154 0.64102564 0.71794872 0.79487179 0.71794872 0.82051282 0.69230769 0.76923077 0.64102564] mean value: 0.6908333333333333 key: train_accuracy value: [0.75213675 0.76704545 0.74715909 0.74715909 0.75 0.74147727 0.73579545 0.73863636 0.72443182 0.75568182] mean value: 0.7459523115773116 key: test_fscore value: [0.63829787 0.59090909 0.66666667 0.73170732 0.81818182 0.73170732 0.84444444 0.73913043 0.7804878 0.68181818] mean value: 0.7223350948167626 key: train_fscore value: [0.77402597 0.78534031 0.76762402 0.77694236 0.77319588 0.76485788 0.75968992 0.7628866 0.74805195 0.78172589] mean value: 0.7694340779160749 key: test_precision value: [0.6 0.59090909 0.7 0.78947368 0.81818182 0.75 0.79166667 0.68 0.8 0.65217391] mean value: 0.717240517301158 key: train_precision value: [0.77604167 0.79365079 0.77368421 0.75242718 0.76923077 0.76683938 0.76165803 0.7628866 0.7539267 0.77 ] mean value: 0.7680345333375814 key: test_recall value: [0.68181818 0.59090909 0.63636364 0.68181818 0.81818182 0.71428571 0.9047619 0.80952381 0.76190476 0.71428571] mean value: 0.7313852813852814 key: train_recall value: [0.77202073 0.77720207 0.76165803 0.80310881 0.77720207 0.7628866 0.75773196 0.7628866 0.74226804 0.79381443] mean value: 0.7710779338710538 key: test_roc_auc value: [0.56313131 0.53074866 0.64171123 0.72326203 0.79144385 0.71825397 0.81349206 0.68253968 0.76984127 0.63492063] mean value: 0.6869344707580002 key: train_roc_auc value: [0.74993441 0.76595953 0.74560889 0.74117705 0.7470916 0.73903824 0.73329636 0.73587368 0.72239984 0.7513376 ] mean value: 0.7431717191049403 key: test_jcc value: [0.46875 0.41935484 0.5 0.57692308 0.69230769 0.57692308 0.73076923 0.5862069 0.64 0.51724138] mean value: 0.5708476191494823 key: train_jcc value: [0.63135593 0.64655172 0.62288136 0.6352459 0.6302521 0.61924686 0.6125 0.61666667 0.59751037 0.64166667] mean value: 0.6253877583455207 MCC on Blind test: 0.4 Accuracy on Blind test: 0.7 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01332045 0.01391959 0.01815677 0.01369214 0.024189 0.02506042 0.03361917 0.0132997 0.01338696 0.013623 ] mean value: 0.01822671890258789 key: score_time value: [0.12917995 0.067523 0.04327273 0.0376215 0.04688621 0.05427647 0.0461905 0.04015064 0.05202198 0.05504966] mean value: 0.05721726417541504 key: test_mcc value: [ 0.12338338 -0.13789005 0.11968254 0.26162798 0.42319443 0.38095238 0.27217941 0.43102253 0.32530002 0.11968254] mean value: 0.23191351778019928 key: train_mcc value: [0.50865848 0.56255301 0.48765473 0.51619414 0.48124505 0.46286069 0.48106234 0.53884115 0.48041774 0.49810084] mean value: 0.5017588175290625 key: test_accuracy value: [0.575 0.46153846 0.56410256 0.64102564 0.71794872 0.69230769 0.64102564 0.71794872 0.66666667 0.56410256] mean value: 0.6241666666666666 key: train_accuracy value: [0.75783476 0.78409091 0.74715909 0.76136364 0.74431818 0.73579545 0.74431818 0.77272727 0.74431818 0.75284091] mean value: 0.7544766576016576 key: test_fscore value: [0.65306122 0.57142857 0.60465116 0.69565217 0.75555556 0.71428571 0.69565217 0.75555556 0.71111111 0.60465116] mean value: 0.6761604405833787 key: train_fscore value: [0.79115479 0.81 0.77468354 0.79207921 0.77722772 0.77372263 0.78365385 0.80392157 0.7804878 0.78832117] mean value: 0.7875252281431442 key: test_precision value: [0.59259259 0.51851852 0.61904762 0.66666667 0.73913043 0.71428571 0.64 0.70833333 0.66666667 0.59090909] mean value: 0.6456150636802811 key: train_precision value: [0.75233645 0.7826087 0.75742574 0.75829384 0.74407583 0.73271889 0.73423423 0.76635514 0.74074074 0.74654378] mean value: 0.7515333343043958 key: test_recall value: [0.72727273 0.63636364 0.59090909 0.72727273 0.77272727 0.71428571 0.76190476 0.80952381 0.76190476 0.61904762] mean value: 0.7121212121212122 key: train_recall value: [0.83419689 0.83937824 0.79274611 0.82901554 0.8134715 0.81958763 0.84020619 0.84536082 0.82474227 0.83505155] mean value: 0.8273756743763687 key: test_roc_auc value: [0.55808081 0.43582888 0.56016043 0.62834225 0.70989305 0.69047619 0.63095238 0.71031746 0.65873016 0.55952381] mean value: 0.6142305407011289 key: train_roc_auc value: [0.74937693 0.77817969 0.74228501 0.75413041 0.73692443 0.72624951 0.73339423 0.76445256 0.73515594 0.74347514] mean value: 0.7463623853929452 key: test_jcc value: [0.48484848 0.4 0.43333333 0.53333333 0.60714286 0.55555556 0.53333333 0.60714286 0.55172414 0.43333333] mean value: 0.5139747225954122 key: train_jcc value: [0.65447154 0.68067227 0.6322314 0.6557377 0.63562753 0.63095238 0.64426877 0.67213115 0.64 0.65060241] mean value: 0.6496695166699569 MCC on Blind test: 0.2 Accuracy on Blind test: 0.61 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02586198 0.02531052 0.04850888 0.04493904 0.02481985 0.05120564 0.0254631 0.02536511 0.0249207 0.02555561] mean value: 0.03219504356384277 key: score_time value: [0.01615477 0.01599693 0.03579712 0.01573038 0.01562381 0.02733827 0.01584196 0.01563573 0.01561165 0.01565671] mean value: 0.018938732147216798 key: test_mcc value: [0.23071239 0.25338873 0.3180697 0.6383069 0.58501794 0.43535772 0.60864099 0.46953014 0.64246755 0.43102253] mean value: 0.4612514587273892 key: train_mcc value: [0.69587587 0.71302795 0.68537043 0.6397475 0.70815606 0.66173069 0.65952731 0.6904032 0.66753551 0.65311656] mean value: 0.6774491102165359 key: test_accuracy value: [0.625 0.64102564 0.66666667 0.82051282 0.79487179 0.71794872 0.79487179 0.71794872 0.82051282 0.71794872] mean value: 0.7317307692307692 key: train_accuracy value: [0.84900285 0.85795455 0.84375 0.82102273 0.85511364 0.82954545 0.82954545 0.84659091 0.83522727 0.82670455] mean value: 0.8394457394457394 key: test_fscore value: [0.70588235 0.72 0.71111111 0.8372093 0.83333333 0.73170732 0.83333333 0.78431373 0.84444444 0.75555556] mean value: 0.7756890475607903 key: train_fscore value: [0.8691358 0.87437186 0.86419753 0.84596577 0.87344913 0.85781991 0.85645933 0.86699507 0.85784314 0.85371703] mean value: 0.8619954567196848 key: test_precision value: [0.62068966 0.64285714 0.69565217 0.85714286 0.76923077 0.75 0.74074074 0.66666667 0.79166667 0.70833333] mean value: 0.7242980005723634 key: train_precision value: [0.83018868 0.84878049 0.8254717 0.80092593 0.83809524 0.79385965 0.79910714 0.83018868 0.81775701 0.79820628] mean value: 0.8182580787782466 key: test_recall value: [0.81818182 0.81818182 0.72727273 0.81818182 0.90909091 0.71428571 0.95238095 0.95238095 0.9047619 0.80952381] mean value: 0.8424242424242424 key: train_recall value: [0.9119171 0.9015544 0.90673575 0.89637306 0.9119171 0.93298969 0.92268041 0.90721649 0.90206186 0.91752577] mean value: 0.9110971636130548 key: test_roc_auc value: [0.60353535 0.61497326 0.65775401 0.82085561 0.77807487 0.71825397 0.78174603 0.6984127 0.81349206 0.71031746] mean value: 0.7197415329768271 key: train_roc_auc value: [0.8420345 0.85329293 0.83701567 0.8129664 0.84904031 0.81776067 0.81893514 0.8396842 0.82761321 0.81635782] mean value: 0.83147008487157 key: test_jcc value: [0.54545455 0.5625 0.55172414 0.72 0.71428571 0.57692308 0.71428571 0.64516129 0.73076923 0.60714286] mean value: 0.6368246567114754 key: train_jcc value: [0.76855895 0.77678571 0.76086957 0.73305085 0.7753304 0.75103734 0.74895397 0.76521739 0.75107296 0.74476987] mean value: 0.7575647021850033 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [2.91022301 3.10206938 2.4169178 2.24846649 2.14688158 3.32444572 3.17221498 2.70555782 2.88934183 2.95270324] mean value: 2.786882185935974 key: score_time value: [0.02126956 0.05364895 0.02685308 0.01277757 0.03223562 0.05568361 0.01993418 0.02448583 0.0377295 0.02343059] mean value: 0.03080484867095947 key: test_mcc value: [0.28721348 0.3180697 0.04905525 0.63344389 0.63344389 0.24935216 0.58730159 0.38575837 0.39894181 0.53826045] mean value: 0.40808405915621887 key: train_mcc value: [0.98276047 0.98281969 0.96558803 0.97705869 0.95984626 0.97707007 0.97703249 0.96572357 0.98278047 0.97707007] mean value: 0.9747749810147373 key: test_accuracy value: [0.65 0.66666667 0.53846154 0.82051282 0.82051282 0.61538462 0.79487179 0.69230769 0.69230769 0.76923077] mean value: 0.706025641025641 key: train_accuracy value: [0.99145299 0.99147727 0.98295455 0.98863636 0.98011364 0.98863636 0.98863636 0.98295455 0.99147727 0.98863636] mean value: 0.9874975718725718 key: test_fscore value: [0.69565217 0.71111111 0.60869565 0.84444444 0.84444444 0.59459459 0.80952381 0.75 0.68421053 0.7804878 ] mean value: 0.7323164561399199 key: train_fscore value: [0.99220779 0.99220779 0.98445596 0.98963731 0.98191214 0.98974359 0.98969072 0.98469388 0.99228792 0.98974359] mean value: 0.9886580689792606 key: test_precision value: [0.66666667 0.69565217 0.58333333 0.82608696 0.82608696 0.6875 0.80952381 0.66666667 0.76470588 0.8 ] mean value: 0.7326222445499939 key: train_precision value: [0.99479167 0.99479167 0.98445596 0.98963731 0.97938144 0.98469388 0.98969072 0.97474747 0.98974359 0.98469388] mean value: 0.9866627582123597 key: test_recall value: [0.72727273 0.72727273 0.63636364 0.86363636 0.86363636 0.52380952 0.80952381 0.85714286 0.61904762 0.76190476] mean value: 0.7389610389610389 key: train_recall value: [0.98963731 0.98963731 0.98445596 0.98963731 0.98445596 0.99484536 0.98969072 0.99484536 0.99484536 0.99484536] mean value: 0.9906895999145344 key: test_roc_auc value: [0.64141414 0.65775401 0.52406417 0.81417112 0.81417112 0.62301587 0.79365079 0.67857143 0.6984127 0.76984127] mean value: 0.7015066632713691 key: train_roc_auc value: [0.9916541 0.991674 0.98279402 0.98852934 0.97964936 0.98792901 0.98851625 0.9815999 0.99109357 0.98792901] mean value: 0.9871368547299765 key: test_jcc value: [0.53333333 0.55172414 0.4375 0.73076923 0.73076923 0.42307692 0.68 0.6 0.52 0.64 ] mean value: 0.5847172855879752 key: train_jcc value: [0.98453608 0.98453608 0.96938776 0.97948718 0.96446701 0.97969543 0.97959184 0.96984925 0.98469388 0.97969543] mean value: 0.9775939928074848 MCC on Blind test: 0.57 Accuracy on Blind test: 0.79 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.03906608 0.0292635 0.05773568 0.03004766 0.03187251 0.03294325 0.03356886 0.0385108 0.03023219 0.04849863] mean value: 0.037173914909362796 key: score_time value: [0.01301384 0.02037883 0.01303339 0.01304865 0.01284504 0.01281381 0.01273775 0.0128572 0.01312661 0.03527904] mean value: 0.01591341495513916 key: test_mcc value: [0.14591299 0.53458203 0.58048707 0.42319443 0.5828877 0.62620255 0.4866238 0.59384599 0.49076688 0.53674504] mean value: 0.5001248476975677 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.575 0.76923077 0.79487179 0.71794872 0.79487179 0.79487179 0.74358974 0.79487179 0.74358974 0.76923077] mean value: 0.7498076923076923 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.60465116 0.79069767 0.82608696 0.75555556 0.81818182 0.77777778 0.7826087 0.82608696 0.75 0.8 ] mean value: 0.7731646597420107 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.61904762 0.80952381 0.79166667 0.73913043 0.81818182 0.93333333 0.72 0.76 0.78947368 0.75 ] mean value: 0.7730357365746382 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.59090909 0.77272727 0.86363636 0.77272727 0.81818182 0.66666667 0.85714286 0.9047619 0.71428571 0.85714286] mean value: 0.7818181818181819 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.57323232 0.76871658 0.78475936 0.70989305 0.79144385 0.80555556 0.73412698 0.78571429 0.74603175 0.76190476] mean value: 0.7461378490790256 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.43333333 0.65384615 0.7037037 0.60714286 0.69230769 0.63636364 0.64285714 0.7037037 0.6 0.66666667] mean value: 0.633992488992489 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.17244458 0.21761703 0.25768661 0.2097497 0.16652131 0.26184726 0.16725016 0.16593075 0.21627688 0.16753721] mean value: 0.2002861499786377 key: score_time value: [0.02551651 0.0255897 0.02410817 0.02407432 0.02434325 0.02443552 0.02443457 0.02423859 0.0243299 0.02459693] mean value: 0.02456674575805664 key: test_mcc value: [0.17545379 0.2045323 0.52791444 0.60639156 0.5828877 0.18205868 0.4866238 0.59160798 0.33245498 0.45848623] mean value: 0.41484114591188126 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.6 0.61538462 0.76923077 0.79487179 0.79487179 0.58974359 0.74358974 0.76923077 0.66666667 0.71794872] mean value: 0.7061538461538461 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.68 0.68085106 0.80851064 0.8 0.81818182 0.6 0.7826087 0.82352941 0.68292683 0.7027027 ] mean value: 0.7379311159697353 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.60714286 0.64 0.76 0.88888889 0.81818182 0.63157895 0.72 0.7 0.7 0.8125 ] mean value: 0.7278292511581985 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.77272727 0.72727273 0.86363636 0.72727273 0.81818182 0.57142857 0.85714286 1. 0.66666667 0.61904762] mean value: 0.7623376623376623 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.58080808 0.59893048 0.75534759 0.80481283 0.79144385 0.59126984 0.73412698 0.75 0.66666667 0.72619048] mean value: 0.6999596808420339 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.51515152 0.51612903 0.67857143 0.66666667 0.69230769 0.42857143 0.64285714 0.7 0.51851852 0.54166667] mean value: 0.5900440091569124 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01549053 0.01434684 0.01424766 0.01411343 0.01413631 0.02322936 0.01407123 0.01369977 0.01412392 0.01400304] mean value: 0.015146207809448243 key: score_time value: [0.01235962 0.01235962 0.01237845 0.01249194 0.01238823 0.03227663 0.01191926 0.01195526 0.01224375 0.01255798] mean value: 0.014293074607849121 key: test_mcc value: [ 0.18463724 0.26162798 0.28117601 0.22340742 0.32713229 0.43085716 0.04948717 -0.03174603 0.11385501 0.20331252] mean value: 0.2043746744064925 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.6 0.64102564 0.64102564 0.61538462 0.66666667 0.71794872 0.51282051 0.48717949 0.56410256 0.58974359] mean value: 0.6035897435897436 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.65217391 0.69565217 0.66666667 0.65116279 0.69767442 0.74418605 0.45714286 0.52380952 0.62222222 0.55555556] mean value: 0.6266246168167301 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.625 0.66666667 0.7 0.66666667 0.71428571 0.72727273 0.57142857 0.52380952 0.58333333 0.66666667] mean value: 0.644512987012987 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.68181818 0.72727273 0.63636364 0.63636364 0.68181818 0.76190476 0.38095238 0.52380952 0.66666667 0.47619048] mean value: 0.6173160173160173 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.59090909 0.62834225 0.64171123 0.61229947 0.6644385 0.71428571 0.52380952 0.48412698 0.55555556 0.59920635] mean value: 0.6014684661743485 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.48387097 0.53333333 0.5 0.48275862 0.53571429 0.59259259 0.2962963 0.35483871 0.4516129 0.38461538] mean value: 0.4615633093886709 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.55 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [2.69135904 2.70917821 2.36334229 2.38255072 2.58042264 1.61714864 1.72178507 1.6445508 1.63154292 1.63047242] mean value: 2.0972352743148805 key: score_time value: [0.14091563 0.12506819 0.12484097 0.21970868 0.09121585 0.0933814 0.0977962 0.09049273 0.09123826 0.09371853] mean value: 0.11683764457702636 key: test_mcc value: [0.33734954 0.42319443 0.52791444 0.56417112 0.5828877 0.41475753 0.53674504 0.59160798 0.54761905 0.65079365] mean value: 0.5177040486022235 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.675 0.71794872 0.76923077 0.76923077 0.79487179 0.69230769 0.76923077 0.76923077 0.76923077 0.82051282] mean value: 0.7546794871794872 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.72340426 0.75555556 0.80851064 0.76923077 0.81818182 0.66666667 0.8 0.82352941 0.76923077 0.82051282] mean value: 0.7754822704760127 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.68 0.73913043 0.76 0.88235294 0.81818182 0.8 0.75 0.7 0.83333333 0.88888889] mean value: 0.7851887416363119 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.77272727 0.77272727 0.86363636 0.68181818 0.81818182 0.57142857 0.85714286 1. 0.71428571 0.76190476] mean value: 0.7813852813852814 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.66414141 0.70989305 0.75534759 0.78208556 0.79144385 0.70238095 0.76190476 0.75 0.77380952 0.82539683] mean value: 0.7516403531109414 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.56666667 0.60714286 0.67857143 0.625 0.69230769 0.5 0.66666667 0.7 0.625 0.69565217] mean value: 0.6357007485268354 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: fit_time value: [1.86898923 0.97265959 0.94574857 0.97016168 0.99996591 0.94538593 0.92937469 0.92096901 0.91902447 0.92951322] mean value: 1.040179228782654 key: score_time value: [0.20569086 0.22888517 0.16473842 0.17857552 0.1652112 0.16618681 0.1822207 0.23874497 0.18768191 0.16507244] mean value: 0.18830080032348634 key: test_mcc value: [0.4411494 0.68677344 0.52791444 0.50266669 0.68677344 0.52048004 0.59384599 0.63427033 0.49076688 0.64116318] mean value: 0.5725803825612074 key: train_mcc value: [0.89658633 0.86227439 0.88013971 0.86240727 0.88543883 0.87963958 0.86809902 0.89152861 0.88512667 0.88528567] mean value: 0.8796526074485547 key: test_accuracy value: [0.725 0.84615385 0.76923077 0.74358974 0.84615385 0.74358974 0.79487179 0.79487179 0.74358974 0.82051282] mean value: 0.7827564102564103 key: train_accuracy value: [0.94871795 0.93181818 0.94034091 0.93181818 0.94318182 0.94034091 0.93465909 0.94602273 0.94318182 0.94318182] mean value: 0.9403263403263403 key: test_fscore value: [0.76595745 0.86956522 0.80851064 0.75 0.86956522 0.72222222 0.82608696 0.84 0.75 0.82926829] mean value: 0.803117599131588 key: train_fscore value: [0.95408163 0.93846154 0.94683544 0.93877551 0.94897959 0.94683544 0.94177215 0.95214106 0.94897959 0.94923858] mean value: 0.9466100539581546 key: test_precision value: [0.72 0.83333333 0.76 0.83333333 0.83333333 0.86666667 0.76 0.72413793 0.78947368 0.85 ] mean value: 0.7970278281911676 key: train_precision value: [0.93969849 0.92893401 0.92574257 0.92462312 0.93467337 0.93034826 0.92537313 0.93103448 0.93939394 0.935 ] mean value: 0.9314821374471468 key: test_recall value: [0.81818182 0.90909091 0.86363636 0.68181818 0.90909091 0.61904762 0.9047619 1. 0.71428571 0.80952381] mean value: 0.822943722943723 key: train_recall value: [0.96891192 0.94818653 0.96891192 0.95336788 0.96373057 0.96391753 0.95876289 0.9742268 0.95876289 0.96391753] mean value: 0.9622696437156135 key: test_roc_auc value: [0.71464646 0.8368984 0.75534759 0.7526738 0.8368984 0.75396825 0.78571429 0.77777778 0.74603175 0.82142857] mean value: 0.7781385281385281 key: train_roc_auc value: [0.94648128 0.93006811 0.93728615 0.92951413 0.94098478 0.93765497 0.93191309 0.9428096 0.94140676 0.94081952] mean value: 0.9378938378597175 key: test_jcc value: [0.62068966 0.76923077 0.67857143 0.6 0.76923077 0.56521739 0.7037037 0.72413793 0.6 0.70833333] mean value: 0.6739114981581249 key: train_jcc value: [0.91219512 0.88405797 0.89903846 0.88461538 0.90291262 0.89903846 0.88995215 0.90865385 0.90291262 0.90338164] mean value: 0.8986758285152439 MCC on Blind test: 0.51 Accuracy on Blind test: 0.76 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0247333 0.00965047 0.00976729 0.00986314 0.00958753 0.00968504 0.01000524 0.00972891 0.00969481 0.00996637] mean value: 0.011268210411071778 key: score_time value: [0.0093112 0.00866818 0.00882983 0.00918913 0.0087862 0.00873756 0.00918365 0.00884771 0.00886369 0.0089376 ] mean value: 0.00893547534942627 key: test_mcc value: [0.12974982 0.06149733 0.28117601 0.44298485 0.5828877 0.43535772 0.64246755 0.37940161 0.53826045 0.27348302] mean value: 0.37672660455215623 key: train_mcc value: [0.49959039 0.53088207 0.49047512 0.48722902 0.49476397 0.47780376 0.46632656 0.47174736 0.44408311 0.50475435] mean value: 0.4867655710273362 key: test_accuracy value: [0.575 0.53846154 0.64102564 0.71794872 0.79487179 0.71794872 0.82051282 0.69230769 0.76923077 0.64102564] mean value: 0.6908333333333333 key: train_accuracy value: [0.75213675 0.76704545 0.74715909 0.74715909 0.75 0.74147727 0.73579545 0.73863636 0.72443182 0.75568182] mean value: 0.7459523115773116 key: test_fscore value: [0.63829787 0.59090909 0.66666667 0.73170732 0.81818182 0.73170732 0.84444444 0.73913043 0.7804878 0.68181818] mean value: 0.7223350948167626 key: train_fscore value: [0.77402597 0.78534031 0.76762402 0.77694236 0.77319588 0.76485788 0.75968992 0.7628866 0.74805195 0.78172589] mean value: 0.7694340779160749 key: test_precision value: [0.6 0.59090909 0.7 0.78947368 0.81818182 0.75 0.79166667 0.68 0.8 0.65217391] mean value: 0.717240517301158 key: train_precision value: [0.77604167 0.79365079 0.77368421 0.75242718 0.76923077 0.76683938 0.76165803 0.7628866 0.7539267 0.77 ] mean value: 0.7680345333375814 key: test_recall value: [0.68181818 0.59090909 0.63636364 0.68181818 0.81818182 0.71428571 0.9047619 0.80952381 0.76190476 0.71428571] mean value: 0.7313852813852814 key: train_recall value: [0.77202073 0.77720207 0.76165803 0.80310881 0.77720207 0.7628866 0.75773196 0.7628866 0.74226804 0.79381443] mean value: 0.7710779338710538 key: test_roc_auc value: [0.56313131 0.53074866 0.64171123 0.72326203 0.79144385 0.71825397 0.81349206 0.68253968 0.76984127 0.63492063] mean value: 0.6869344707580002 key: train_roc_auc value: [0.74993441 0.76595953 0.74560889 0.74117705 0.7470916 0.73903824 0.73329636 0.73587368 0.72239984 0.7513376 ] mean value: 0.7431717191049403 key: test_jcc value: [0.46875 0.41935484 0.5 0.57692308 0.69230769 0.57692308 0.73076923 0.5862069 0.64 0.51724138] mean value: 0.5708476191494823 key: train_jcc value: [0.63135593 0.64655172 0.62288136 0.6352459 0.6302521 0.61924686 0.6125 0.61666667 0.59751037 0.64166667] mean value: 0.6253877583455207 MCC on Blind test: 0.4 Accuracy on Blind test: 0.7 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [2.41412258 1.95130968 0.16859365 0.86095262 0.63910151 0.23897123 1.51896191 0.15470123 0.44391131 1.22929597] mean value: 0.9619921684265137 key: score_time value: [0.01328516 0.01376867 0.01158524 0.02055979 0.0114162 0.01436687 0.01219177 0.01202273 0.01257467 0.01273799] mean value: 0.013450908660888671 key: test_mcc value: [0.5959596 0.59153067 0.74350254 0.59153067 0.68716578 0.56305327 0.70106818 0.7200823 0.56305327 0.74203177] mean value: 0.6498978041274719 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8 0.79487179 0.87179487 0.79487179 0.84615385 0.76923077 0.84615385 0.84615385 0.76923077 0.87179487] mean value: 0.821025641025641 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.81818182 0.80952381 0.89361702 0.80952381 0.86363636 0.75675676 0.86956522 0.875 0.75675676 0.88372093] mean value: 0.8336282483279773 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.81818182 0.85 0.84 0.85 0.86363636 0.875 0.8 0.77777778 0.875 0.86363636] mean value: 0.8413232323232324 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.77272727 0.95454545 0.77272727 0.86363636 0.66666667 0.95238095 1. 0.66666667 0.9047619 ] mean value: 0.8372294372294372 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.7979798 0.79812834 0.85962567 0.79812834 0.84358289 0.77777778 0.83730159 0.83333333 0.77777778 0.86904762] mean value: 0.8192683133859604 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.69230769 0.68 0.80769231 0.68 0.76 0.60869565 0.76923077 0.77777778 0.60869565 0.79166667] mean value: 0.717606651802304 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.7 Accuracy on Blind test: 0.85 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.05918598 0.08959985 0.13512325 0.04787421 0.03671384 0.12405276 0.09519744 0.09040046 0.09243035 0.09333897] mean value: 0.08639171123504638 key: score_time value: [0.03301191 0.02424884 0.03447104 0.01281452 0.0169549 0.03200459 0.033674 0.016186 0.02198958 0.02257538] mean value: 0.024793076515197753 key: test_mcc value: [0.49236596 0.04905525 0.19149207 0.52791444 0.47420071 0.17460317 0.54554473 0.49719968 0.44444444 0.56305327] mean value: 0.3959873728319808 key: train_mcc value: [0.79270453 0.75889909 0.79327937 0.74724952 0.79904804 0.81626246 0.77032494 0.76998824 0.80452977 0.76428139] mean value: 0.7816567362861676 key: test_accuracy value: [0.75 0.53846154 0.58974359 0.76923077 0.74358974 0.58974359 0.76923077 0.74358974 0.71794872 0.76923077] mean value: 0.698076923076923 key: train_accuracy value: [0.8974359 0.88068182 0.89772727 0.875 0.90056818 0.90909091 0.88636364 0.88636364 0.90340909 0.88352273] mean value: 0.892016317016317 key: test_fscore value: [0.7826087 0.60869565 0.6 0.80851064 0.7826087 0.61904762 0.80851064 0.79166667 0.71794872 0.75675676] mean value: 0.7276354080493765 key: train_fscore value: [0.90862944 0.89175258 0.90769231 0.8877551 0.91002571 0.91919192 0.89690722 0.89795918 0.91326531 0.89514066] mean value: 0.902831942606227 key: test_precision value: [0.75 0.58333333 0.66666667 0.76 0.75 0.61904762 0.73076923 0.7037037 0.77777778 0.875 ] mean value: 0.7216298331298331 key: train_precision value: [0.89054726 0.88717949 0.89847716 0.87437186 0.90306122 0.9009901 0.89690722 0.88888889 0.9040404 0.88832487] mean value: 0.893278847353825 key: test_recall value: [0.81818182 0.63636364 0.54545455 0.86363636 0.81818182 0.61904762 0.9047619 0.9047619 0.66666667 0.66666667] mean value: 0.7443722943722944 key: train_recall value: [0.92746114 0.89637306 0.91709845 0.9015544 0.91709845 0.93814433 0.89690722 0.90721649 0.92268041 0.90206186] mean value: 0.912659580150633 key: test_roc_auc value: [0.74242424 0.52406417 0.59625668 0.75534759 0.73262032 0.58730159 0.75793651 0.73015873 0.72222222 0.77777778] mean value: 0.6926109837874543 key: train_roc_auc value: [0.89411032 0.87900414 0.89565614 0.87216085 0.8988008 0.90578103 0.88516247 0.88398799 0.90121362 0.88141067] mean value: 0.8897288028927673 key: test_jcc value: [0.64285714 0.4375 0.42857143 0.67857143 0.64285714 0.44827586 0.67857143 0.65517241 0.56 0.60869565] mean value: 0.5781072499464553 key: train_jcc value: [0.83255814 0.80465116 0.83098592 0.79816514 0.83490566 0.85046729 0.81308411 0.81481481 0.84037559 0.81018519] mean value: 0.8230193004534195 MCC on Blind test: 0.33 Accuracy on Blind test: 0.67 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01592779 0.01515031 0.01602578 0.01546907 0.01665211 0.01538968 0.01587939 0.01483941 0.0104537 0.01027417] mean value: 0.014606142044067382 key: score_time value: [0.01331186 0.01377368 0.0143106 0.01406026 0.01515913 0.01358914 0.01425743 0.01117134 0.0091939 0.00872111] mean value: 0.01275484561920166 key: test_mcc value: [0.23071239 0.15534161 0.42319443 0.32713229 0.58048707 0.43535772 0.46953014 0.38575837 0.37805005 0.34126984] mean value: 0.3726833915774863 key: train_mcc value: [0.43234492 0.44054161 0.39960965 0.42287405 0.40527156 0.39251133 0.41594087 0.42184628 0.40408608 0.41606682] mean value: 0.4151093163669143 key: test_accuracy value: [0.625 0.58974359 0.71794872 0.66666667 0.79487179 0.71794872 0.71794872 0.69230769 0.69230769 0.66666667] mean value: 0.6881410256410256 key: train_accuracy value: [0.72079772 0.72443182 0.70454545 0.71590909 0.70738636 0.70170455 0.71306818 0.71590909 0.70738636 0.71306818] mean value: 0.7124206811706811 key: test_fscore value: [0.70588235 0.65217391 0.75555556 0.69767442 0.82608696 0.73170732 0.78431373 0.75 0.72727273 0.66666667] mean value: 0.7297333633169361 key: train_fscore value: [0.75980392 0.76399027 0.75 0.75609756 0.74939173 0.74327628 0.75305623 0.75490196 0.75060533 0.75184275] mean value: 0.7532966035519044 key: test_precision value: [0.62068966 0.625 0.73913043 0.71428571 0.79166667 0.75 0.66666667 0.66666667 0.69565217 0.72222222] mean value: 0.6991980200376002 key: train_precision value: [0.72093023 0.72018349 0.69955157 0.71428571 0.70642202 0.70697674 0.71627907 0.71962617 0.70776256 0.71830986] mean value: 0.7130327419348079 key: test_recall value: [0.81818182 0.68181818 0.77272727 0.68181818 0.86363636 0.71428571 0.95238095 0.85714286 0.76190476 0.61904762] mean value: 0.7722943722943723 key: train_recall value: [0.80310881 0.8134715 0.80829016 0.80310881 0.79792746 0.78350515 0.79381443 0.79381443 0.79896907 0.78865979] mean value: 0.7984669622349233 key: test_roc_auc value: [0.60353535 0.57620321 0.70989305 0.6644385 0.78475936 0.71825397 0.6984127 0.67857143 0.68650794 0.67063492] mean value: 0.6791210423563364 key: train_roc_auc value: [0.71168099 0.71491185 0.69345325 0.70658585 0.69770587 0.69238549 0.70386924 0.7070338 0.69695289 0.70445648] mean value: 0.702903571078452 key: test_jcc value: [0.54545455 0.48387097 0.60714286 0.53571429 0.7037037 0.57692308 0.64516129 0.6 0.57142857 0.5 ] mean value: 0.5769399298431557 key: train_jcc value: [0.61264822 0.61811024 0.6 0.60784314 0.59922179 0.59143969 0.60392157 0.60629921 0.60077519 0.6023622 ] mean value: 0.6042621253167205 MCC on Blind test: 0.26 Accuracy on Blind test: 0.64 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01936674 0.01978564 0.01572657 0.01591492 0.01743746 0.01959467 0.01773 0.0185678 0.01736331 0.01891422] mean value: 0.01804013252258301 key: score_time value: [0.00950718 0.01122332 0.01133323 0.01174617 0.01184201 0.01192522 0.01185989 0.01185942 0.01192927 0.01191807] mean value: 0.011514377593994141 key: test_mcc value: [0.20100756 0.43117497 0.2266439 0.5464364 0.6947088 0.33410548 0.64246755 0.16891598 0.47837594 0.54554473] mean value: 0.426938130250252 key: train_mcc value: [0.45569532 0.71555702 0.53425074 0.65630427 0.68921205 0.64990271 0.68603947 0.68075596 0.65541085 0.57746071] mean value: 0.6300589116802254 key: test_accuracy value: [0.6 0.71794872 0.58974359 0.76923077 0.84615385 0.66666667 0.82051282 0.58974359 0.71794872 0.76923077] mean value: 0.7087179487179487 key: train_accuracy value: [0.6951567 0.85795455 0.71590909 0.82954545 0.84090909 0.81534091 0.84375 0.82954545 0.8125 0.76704545] mean value: 0.8007656695156695 key: test_fscore value: [0.72413793 0.78431373 0.55555556 0.7804878 0.875 0.73469388 0.84444444 0.63636364 0.68571429 0.80851064] mean value: 0.7429221899329542 key: train_fscore value: [0.78296146 0.87745098 0.65517241 0.84375 0.86792453 0.85327314 0.85564304 0.82758621 0.80588235 0.82478632] mean value: 0.8194430449874387 key: test_precision value: [0.58333333 0.68965517 0.71428571 0.84210526 0.80769231 0.64285714 0.79166667 0.60869565 0.85714286 0.73076923] mean value: 0.7268203340492854 key: train_precision value: [0.64333333 0.83255814 0.97938144 0.84816754 0.7965368 0.75903614 0.87165775 0.93506494 0.93835616 0.70437956] mean value: 0.8308471812052299 key: test_recall value: [0.95454545 0.90909091 0.45454545 0.72727273 0.95454545 0.85714286 0.9047619 0.66666667 0.57142857 0.9047619 ] mean value: 0.7904761904761904 key: train_recall value: [1. 0.92746114 0.49222798 0.83937824 0.95336788 0.9742268 0.84020619 0.74226804 0.70618557 0.99484536] mean value: 0.8470167191923508 key: test_roc_auc value: [0.56060606 0.68983957 0.60962567 0.77540107 0.8302139 0.65079365 0.81349206 0.58333333 0.73015873 0.75793651] mean value: 0.7001400560224089 key: train_roc_auc value: [0.66139241 0.85052302 0.73982468 0.82849415 0.8288852 0.79723998 0.84415373 0.83948845 0.82461177 0.74109357] mean value: 0.7955706953974652 key: test_jcc value: [0.56756757 0.64516129 0.38461538 0.64 0.77777778 0.58064516 0.73076923 0.46666667 0.52173913 0.67857143] mean value: 0.5993513638015742 key: train_jcc value: [0.64333333 0.78165939 0.48717949 0.72972973 0.76666667 0.74409449 0.74770642 0.70588235 0.67487685 0.70181818] mean value: 0.6982946897812828 MCC on Blind test: 0.44 Accuracy on Blind test: 0.7 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02165914 0.03685188 0.02395153 0.01961398 0.01859236 0.02534246 0.02067137 0.0190196 0.02112269 0.01930928] mean value: 0.02261343002319336 key: score_time value: [0.01186466 0.01319265 0.01339817 0.01180243 0.0114572 0.01194096 0.0117588 0.01214409 0.01232433 0.01206565] mean value: 0.01219489574432373 key: test_mcc value: [0.32824398 0.32178399 0.2513369 0.63570849 0.39777409 0.37115374 0.54554473 0.54870326 0.34293954 0.64116318] mean value: 0.43843519028638167 key: train_mcc value: [0.62855308 0.72812128 0.65071816 0.70660114 0.39479175 0.67705292 0.73872051 0.75859297 0.49184251 0.68963745] mean value: 0.6464631763631339 key: test_accuracy value: [0.65 0.66666667 0.61538462 0.82051282 0.64102564 0.66666667 0.76923077 0.74358974 0.61538462 0.82051282] mean value: 0.7008974358974359 key: train_accuracy value: [0.78347578 0.85511364 0.80965909 0.84659091 0.61079545 0.83806818 0.86647727 0.87784091 0.68181818 0.84659091] mean value: 0.8016430328930328 key: test_fscore value: [0.63157895 0.75471698 0.61538462 0.85106383 0.5625 0.62857143 0.80851064 0.80769231 0.48275862 0.82926829] mean value: 0.6972045661606536 key: train_fscore value: [0.75949367 0.88221709 0.80118694 0.8744186 0.4497992 0.848 0.88836105 0.89638554 0.5971223 0.86567164] mean value: 0.7862656037262483 key: test_precision value: [0.75 0.64516129 0.70588235 0.8 0.9 0.78571429 0.73076923 0.67741935 0.875 0.85 ] mean value: 0.7719946514585984 key: train_precision value: [0.97560976 0.79583333 0.9375 0.79324895 1. 0.87845304 0.82378855 0.84162896 0.98809524 0.83653846] mean value: 0.8870696278417831 key: test_recall value: [0.54545455 0.90909091 0.54545455 0.90909091 0.40909091 0.52380952 0.9047619 1. 0.33333333 0.80952381] mean value: 0.688961038961039 key: train_recall value: [0.62176166 0.98963731 0.69948187 0.97409326 0.29015544 0.81958763 0.96391753 0.95876289 0.42783505 0.89690722] mean value: 0.7642139842957107 key: test_roc_auc value: [0.66161616 0.63101604 0.62566845 0.80748663 0.67513369 0.67857143 0.75793651 0.72222222 0.63888889 0.82142857] mean value: 0.7019968593498005 key: train_roc_auc value: [0.80138716 0.8407306 0.82143905 0.83295858 0.64507772 0.84017356 0.85537648 0.86862195 0.71075297 0.84085867] mean value: 0.8057376744183752 key: test_jcc value: [0.46153846 0.60606061 0.44444444 0.74074074 0.39130435 0.45833333 0.67857143 0.67741935 0.31818182 0.70833333] mean value: 0.5484927868868963 key: train_jcc value: [0.6122449 0.7892562 0.66831683 0.7768595 0.29015544 0.73611111 0.7991453 0.81222707 0.42564103 0.76315789] mean value: 0.6673115277406285 MCC on Blind test: 0.29 Accuracy on Blind test: 0.64 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.18501496 0.22413325 0.14739823 0.14828968 0.1485734 0.15752697 0.16709542 0.14933062 0.17041063 0.21680474] mean value: 0.17145779132843017 key: score_time value: [0.01990271 0.02225757 0.01513314 0.01543474 0.01512551 0.01812315 0.01513577 0.01510811 0.0234859 0.01809883] mean value: 0.017780542373657227 key: test_mcc value: [0.49236596 0.5464364 0.52831916 0.55002604 0.6947088 0.38786415 0.74819006 0.44840472 0.44444444 0.6383069 ] mean value: 0.5479066635637294 key: train_mcc value: [0.97697908 0.93712371 0.94863098 0.93126941 0.95411738 0.93686739 0.94836906 0.95447318 0.93176511 0.93118204] mean value: 0.9450777327437905 key: test_accuracy value: [0.75 0.76923077 0.76923077 0.74358974 0.84615385 0.69230769 0.87179487 0.71794872 0.71794872 0.82051282] mean value: 0.7698717948717949 key: train_accuracy value: [0.98860399 0.96875 0.97443182 0.96590909 0.97727273 0.96875 0.97443182 0.97727273 0.96590909 0.96590909] mean value: 0.9727240352240353 key: test_fscore value: [0.7826087 0.7804878 0.8 0.72222222 0.875 0.7 0.88888889 0.7755102 0.71794872 0.8372093 ] mean value: 0.7879875835997265 key: train_fscore value: [0.98963731 0.97186701 0.9769821 0.96923077 0.97927461 0.97186701 0.97674419 0.97969543 0.96969697 0.96938776] mean value: 0.9754383141178787 key: test_precision value: [0.75 0.84210526 0.7826087 0.92857143 0.80769231 0.73684211 0.83333333 0.67857143 0.77777778 0.81818182] mean value: 0.795568415820132 key: train_precision value: [0.98963731 0.95959596 0.96464646 0.95939086 0.97927461 0.96446701 0.97927461 0.965 0.95049505 0.95959596] mean value: 0.9671377829861048 key: test_recall value: [0.81818182 0.72727273 0.81818182 0.59090909 0.95454545 0.66666667 0.95238095 0.9047619 0.66666667 0.85714286] mean value: 0.7956709956709956 key: train_recall value: [0.98963731 0.98445596 0.98963731 0.97927461 0.97927461 0.97938144 0.9742268 0.99484536 0.98969072 0.97938144] mean value: 0.983980556594199 key: test_roc_auc value: [0.74242424 0.77540107 0.76203209 0.76604278 0.8302139 0.69444444 0.86507937 0.70238095 0.72222222 0.81746032] mean value: 0.7677701383583736 key: train_roc_auc value: [0.98848954 0.96707075 0.97280607 0.96448007 0.97705869 0.96753882 0.97445517 0.97527078 0.96319979 0.96437427] mean value: 0.9714743958036675 key: test_jcc value: [0.64285714 0.64 0.66666667 0.56521739 0.77777778 0.53846154 0.8 0.63333333 0.56 0.72 ] mean value: 0.6544313850400807 key: train_jcc value: [0.97948718 0.94527363 0.955 0.94029851 0.95939086 0.94527363 0.95454545 0.960199 0.94117647 0.94059406] mean value: 0.9521238803090375 MCC on Blind test: 0.51 Accuracy on Blind test: 0.76 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.0628612 0.08177304 0.0515604 0.07316899 0.06709433 0.06401539 0.07528853 0.07673025 0.06154585 0.06542706] mean value: 0.06794650554656982 key: score_time value: [0.02510285 0.02193046 0.01847267 0.03015375 0.01803255 0.02999711 0.02398348 0.0255239 0.02519321 0.03015709] mean value: 0.024854707717895507 key: test_mcc value: [0.40201513 0.35561497 0.69498222 0.48807911 0.74203177 0.52048004 0.69175116 0.48261709 0.71011643 0.54761905] mean value: 0.5635306969360055 key: train_mcc value: [0.954074 0.96006097 0.95483116 0.94842621 0.97736551 0.96031806 0.99427116 0.94836906 0.95478908 0.94383671] mean value: 0.9596341899966139 key: test_accuracy value: [0.7 0.66666667 0.84615385 0.74358974 0.87179487 0.74358974 0.84615385 0.74358974 0.84615385 0.76923077] mean value: 0.7776923076923077 key: train_accuracy value: [0.97720798 0.98011364 0.97727273 0.97443182 0.98863636 0.98011364 0.99715909 0.97443182 0.97727273 0.97159091] mean value: 0.9798230704480705 key: test_fscore value: [0.71428571 0.66666667 0.85714286 0.76190476 0.88372093 0.72222222 0.86363636 0.77272727 0.84210526 0.76923077] mean value: 0.7853642821207081 key: train_fscore value: [0.97916667 0.98172324 0.97894737 0.97662338 0.9895288 0.98172324 0.99742931 0.97674419 0.97905759 0.97368421] mean value: 0.9814627976826897 key: test_precision value: [0.75 0.76470588 0.9 0.8 0.9047619 0.86666667 0.82608696 0.73913043 0.94117647 0.83333333] mean value: 0.8325861649007429 key: train_precision value: [0.98429319 0.98947368 0.99465241 0.97916667 1. 0.99470899 0.99487179 0.97927461 0.99468085 0.99462366] mean value: 0.9905745858969144 key: test_recall value: [0.68181818 0.59090909 0.81818182 0.72727273 0.86363636 0.61904762 0.9047619 0.80952381 0.76190476 0.71428571] mean value: 0.7491341991341991 key: train_recall value: [0.97409326 0.97409326 0.96373057 0.97409326 0.97927461 0.96907216 1. 0.9742268 0.96391753 0.95360825] mean value: 0.9726109716361305 key: test_roc_auc value: [0.7020202 0.67780749 0.85026738 0.7459893 0.87299465 0.75396825 0.84126984 0.73809524 0.8531746 0.77380952] mean value: 0.7809396485867074 key: train_roc_auc value: [0.97755296 0.98075732 0.97872063 0.97446802 0.98963731 0.98137153 0.99683544 0.97445517 0.97879421 0.97363957] mean value: 0.9806232152982022 key: test_jcc value: [0.55555556 0.5 0.75 0.61538462 0.79166667 0.56521739 0.76 0.62962963 0.72727273 0.625 ] mean value: 0.6519726585813542 key: train_jcc value: [0.95918367 0.96410256 0.95876289 0.95431472 0.97927461 0.96410256 0.99487179 0.95454545 0.95897436 0.94871795] mean value: 0.9636850577593158 MCC on Blind test: 0.52 Accuracy on Blind test: 0.76 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.09267759 0.10598755 0.13946033 0.09499311 0.1663239 0.14514065 0.1052444 0.09951377 0.11682558 0.12099361] mean value: 0.11871604919433594 key: score_time value: [0.02805328 0.01415372 0.02762222 0.01744652 0.01424694 0.06440473 0.01440763 0.02244258 0.02212453 0.02216029] mean value: 0.024706244468688965 key: test_mcc value: [0.01609348 0.19821695 0.47532708 0.19821695 0.52831916 0.38095238 0.43643578 0.32713229 0.38786415 0.22340742] mean value: 0.31719656277699165 key: train_mcc value: [0.98276047 0.97705869 0.98281969 0.98280066 0.97714822 0.99427116 0.98851625 0.97707007 0.98278047 0.97723742] mean value: 0.9822463092364953 key: test_accuracy value: [0.525 0.61538462 0.74358974 0.61538462 0.76923077 0.69230769 0.71794872 0.66666667 0.69230769 0.61538462] mean value: 0.6653205128205129 key: train_accuracy value: [0.99145299 0.98863636 0.99147727 0.99147727 0.98863636 0.99715909 0.99431818 0.98863636 0.99147727 0.98863636] mean value: 0.9911907536907537 key: test_fscore value: [0.6122449 0.69387755 0.79166667 0.69387755 0.8 0.71428571 0.76595745 0.69767442 0.7 0.65116279] mean value: 0.7120747037063218 key: train_fscore value: [0.99220779 0.98963731 0.99220779 0.99224806 0.98958333 0.99742931 0.99484536 0.98974359 0.99228792 0.98979592] mean value: 0.9919986378049969 key: test_precision value: [0.55555556 0.62962963 0.73076923 0.62962963 0.7826087 0.71428571 0.69230769 0.68181818 0.73684211 0.63636364] mean value: 0.6789810071274602 key: train_precision value: [0.99479167 0.98963731 0.99479167 0.98969072 0.9947644 0.99487179 0.99484536 0.98469388 0.98974359 0.97979798] mean value: 0.9907628361377186 key: test_recall value: [0.68181818 0.77272727 0.86363636 0.77272727 0.81818182 0.71428571 0.85714286 0.71428571 0.66666667 0.66666667] mean value: 0.7528138528138528 key: train_recall value: [0.98963731 0.98963731 0.98963731 0.99481865 0.98445596 1. 0.99484536 0.99484536 0.99484536 1. ] mean value: 0.9932722610971636 key: test_roc_auc value: [0.50757576 0.59224599 0.72593583 0.59224599 0.76203209 0.69047619 0.70634921 0.66269841 0.69444444 0.61111111] mean value: 0.6545115015703251 key: train_roc_auc value: [0.9916541 0.98852934 0.991674 0.99112002 0.98908333 0.99683544 0.99425812 0.98792901 0.99109357 0.98734177] mean value: 0.9909518697413212 key: test_jcc value: [0.44117647 0.53125 0.65517241 0.53125 0.66666667 0.55555556 0.62068966 0.53571429 0.53846154 0.48275862] mean value: 0.5558695206641454 key: train_jcc value: [0.98453608 0.97948718 0.98453608 0.98461538 0.97938144 0.99487179 0.98974359 0.97969543 0.98469388 0.97979798] mean value: 0.9841358845786453 MCC on Blind test: 0.26 Accuracy on Blind test: 0.64 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.65745234 0.59602952 0.60603476 0.63420153 0.61747909 0.58377504 0.63561296 0.65917778 0.65637898 0.67190957] mean value: 0.631805157661438 key: score_time value: [0.00936556 0.00926352 0.01064134 0.01090002 0.00955009 0.00982285 0.0108304 0.01085973 0.01127481 0.01107192] mean value: 0.010358023643493652 key: test_mcc value: [0.50251891 0.64988795 0.74350254 0.71011643 0.79144385 0.69657235 0.70106818 0.65465367 0.59366961 0.69047619] mean value: 0.6733909681187302 key: train_mcc value: [1. 0.99427786 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.999427786446393 key: test_accuracy value: [0.75 0.82051282 0.87179487 0.84615385 0.8974359 0.84615385 0.84615385 0.82051282 0.79487179 0.84615385] mean value: 0.833974358974359 key: train_accuracy value: [1. 0.99715909 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9997159090909091 key: test_fscore value: [0.76190476 0.82926829 0.89361702 0.85 0.90909091 0.85 0.86956522 0.85106383 0.8 0.85714286] mean value: 0.847165288927659 key: train_fscore value: [1. 0.99741602 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9997416020671834 key: test_precision value: [0.8 0.89473684 0.84 0.94444444 0.90909091 0.89473684 0.8 0.76923077 0.84210526 0.85714286] mean value: 0.8551487927277401 key: train_precision value: [1. 0.99484536 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9994845360824742 key: test_recall value: [0.72727273 0.77272727 0.95454545 0.77272727 0.90909091 0.80952381 0.95238095 0.95238095 0.76190476 0.85714286] mean value: 0.8469696969696969 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.75252525 0.82754011 0.85962567 0.85695187 0.89572193 0.84920635 0.83730159 0.80952381 0.79761905 0.8452381 ] mean value: 0.8331253713606654 key: train_roc_auc value: [1. 0.99685535 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.999685534591195 key: test_jcc value: [0.61538462 0.70833333 0.80769231 0.73913043 0.83333333 0.73913043 0.76923077 0.74074074 0.66666667 0.75 ] mean value: 0.7369642635946984 key: train_jcc value: [1. 0.99484536 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9994845360824742 MCC on Blind test: 0.63 Accuracy on Blind test: 0.82 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.12581205 0.08368993 0.05365229 0.05670118 0.05033875 0.04044056 0.03177214 0.08045316 0.04733801 0.04449892] mean value: 0.06146969795227051 key: score_time value: [0.02203012 0.0189116 0.02993417 0.03207183 0.02699876 0.01452136 0.02443647 0.01289511 0.02427244 0.02420735] mean value: 0.023027920722961427 key: test_mcc value: [0.05025189 0.13434332 0.32178399 0.00944069 0.53206037 0.44840472 0.31180478 0.31180478 0.3474523 0.10329663] mean value: 0.2570643476636058 key: train_mcc value: [0.47126888 0.44829265 0.44829265 0.44829265 0.42181028 0.44030757 0.44030757 0.44560397 0.44560397 0.44030757] mean value: 0.4450087760196973 key: test_accuracy value: [0.55 0.58974359 0.66666667 0.53846154 0.74358974 0.71794872 0.61538462 0.61538462 0.66666667 0.56410256] mean value: 0.6267948717948718 key: train_accuracy value: [0.7037037 0.69034091 0.69034091 0.69034091 0.67613636 0.6875 0.6875 0.69034091 0.69034091 0.6875 ] mean value: 0.6894044612794613 key: test_fscore value: [0.66666667 0.72413793 0.75471698 0.65384615 0.81481481 0.7755102 0.73684211 0.73684211 0.74509804 0.66666667] mean value: 0.7275141667984495 key: train_fscore value: [0.7877551 0.77979798 0.77979798 0.77979798 0.772 0.77911647 0.77911647 0.7806841 0.7806841 0.77911647] mean value: 0.779786664828065 key: test_precision value: [0.5625 0.58333333 0.64516129 0.56666667 0.6875 0.67857143 0.58333333 0.58333333 0.63333333 0.56666667] mean value: 0.6090399385560676 key: train_precision value: [0.64983165 0.63907285 0.63907285 0.63907285 0.6286645 0.63815789 0.63815789 0.64026403 0.64026403 0.63815789] mean value: 0.6390716425007821 key: test_recall value: [0.81818182 0.95454545 0.90909091 0.77272727 1. 0.9047619 1. 1. 0.9047619 0.80952381] mean value: 0.9073593073593074 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.52020202 0.53609626 0.63101604 0.5040107 0.70588235 0.70238095 0.58333333 0.58333333 0.6468254 0.54365079] mean value: 0.5956731177319412 key: train_roc_auc value: [0.67088608 0.6572327 0.6572327 0.6572327 0.64150943 0.65189873 0.65189873 0.65506329 0.65506329 0.65189873] mean value: 0.6549916407929305 key: test_jcc value: [0.5 0.56756757 0.60606061 0.48571429 0.6875 0.63333333 0.58333333 0.58333333 0.59375 0.5 ] mean value: 0.574059245934246 key: train_jcc value: [0.64983165 0.63907285 0.63907285 0.63907285 0.6286645 0.63815789 0.63815789 0.64026403 0.64026403 0.63815789] mean value: 0.6390716425007821 MCC on Blind test: 0.12 Accuracy on Blind test: 0.58 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.04989004 0.03763151 0.04120517 0.05915475 0.04827642 0.03862739 0.04097748 0.04380655 0.0387454 0.04259706] mean value: 0.04409117698669433 key: score_time value: [0.0314424 0.02498007 0.01469374 0.02595615 0.0258348 0.02157927 0.04579115 0.03543663 0.02513885 0.03504443] mean value: 0.028589749336242677 key: test_mcc value: [0.4411494 0.21294497 0.32713229 0.63344389 0.63344389 0.3539192 0.4866238 0.59160798 0.49076688 0.65079365] mean value: 0.4821825943663392 key: train_mcc value: [0.77522299 0.73593017 0.76499433 0.71848548 0.74731628 0.75280405 0.76436443 0.76997315 0.74696056 0.74696087] mean value: 0.7523012309263739 key: test_accuracy value: [0.725 0.61538462 0.66666667 0.82051282 0.82051282 0.66666667 0.74358974 0.76923077 0.74358974 0.82051282] mean value: 0.7391666666666666 key: train_accuracy value: [0.88888889 0.86931818 0.88352273 0.86079545 0.875 0.87784091 0.88352273 0.88636364 0.875 0.875 ] mean value: 0.8775252525252525 key: test_fscore value: [0.76595745 0.66666667 0.69767442 0.84444444 0.84444444 0.64864865 0.7826087 0.82352941 0.75 0.82051282] mean value: 0.7644486997547066 key: train_fscore value: [0.90025575 0.88383838 0.89350649 0.87468031 0.88832487 0.89168766 0.89672544 0.89847716 0.8877551 0.88888889] mean value: 0.8904140058349286 key: test_precision value: [0.72 0.65217391 0.71428571 0.82608696 0.82608696 0.75 0.72 0.7 0.78947368 0.88888889] mean value: 0.7586996113472086 key: train_precision value: [0.88888889 0.86206897 0.89583333 0.86363636 0.87064677 0.87192118 0.87684729 0.885 0.87878788 0.87128713] mean value: 0.8764917797952135 key: test_recall value: [0.81818182 0.68181818 0.68181818 0.86363636 0.86363636 0.57142857 0.85714286 1. 0.71428571 0.76190476] mean value: 0.7813852813852814 key: train_recall value: [0.9119171 0.90673575 0.89119171 0.88601036 0.90673575 0.91237113 0.91752577 0.91237113 0.89690722 0.90721649] mean value: 0.9048982426152449 key: test_roc_auc value: [0.71464646 0.60561497 0.6644385 0.81417112 0.81417112 0.67460317 0.73412698 0.75 0.74603175 0.82539683] mean value: 0.7343200916730328 key: train_roc_auc value: [0.8863383 0.86531756 0.88270277 0.85809952 0.87160687 0.87390709 0.87964896 0.88340076 0.87250424 0.87132977] mean value: 0.8744855833727446 key: test_jcc value: [0.62068966 0.5 0.53571429 0.73076923 0.73076923 0.48 0.64285714 0.7 0.6 0.69565217] mean value: 0.6236451719195347 key: train_jcc value: [0.81860465 0.7918552 0.80751174 0.77727273 0.79908676 0.80454545 0.81278539 0.8156682 0.79816514 0.8 ] mean value: 0.8025495260188461 MCC on Blind test: 0.4 Accuracy on Blind test: 0.7 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.4385066 0.42086315 0.38916779 0.36381245 0.39142966 0.44574833 0.50866032 0.32456923 0.28603816 0.34201574] mean value: 0.3910811424255371 key: score_time value: [0.02288485 0.04180908 0.01133728 0.02785611 0.03040051 0.02803445 0.01187062 0.07963872 0.01995134 0.02377748] mean value: 0.029756045341491698 key: test_mcc value: [0.34054054 0.36829757 0.32713229 0.74203177 0.58501794 0.28496141 0.54554473 0.46953014 0.58730159 0.65079365] mean value: 0.49011516351553097 key: train_mcc value: [0.69433091 0.67816125 0.76499433 0.65512331 0.68964641 0.70125462 0.67188139 0.68345012 0.71827166 0.74696087] mean value: 0.7004074872029109 key: test_accuracy value: [0.675 0.69230769 0.66666667 0.87179487 0.79487179 0.64102564 0.76923077 0.71794872 0.79487179 0.82051282] mean value: 0.744423076923077 key: train_accuracy value: [0.84900285 0.84090909 0.88352273 0.82954545 0.84659091 0.85227273 0.83806818 0.84375 0.86079545 0.875 ] mean value: 0.8519457394457395 key: test_fscore value: [0.71111111 0.76 0.69767442 0.88372093 0.83333333 0.65 0.80851064 0.78431373 0.80952381 0.82051282] mean value: 0.7758700787106352 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_sl.py:115: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_sl.py:118: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.86582278 0.85858586 0.89350649 0.84693878 0.86294416 0.87064677 0.85642317 0.86075949 0.87719298 0.88888889] mean value: 0.8681709379837828 key: test_precision value: [0.69565217 0.67857143 0.71428571 0.9047619 0.76923077 0.68421053 0.73076923 0.66666667 0.80952381 0.88888889] mean value: 0.7542561112927245 key: train_precision value: [0.84653465 0.83743842 0.89583333 0.83417085 0.84577114 0.84134615 0.83743842 0.84577114 0.85365854 0.87128713] mean value: 0.8509249796062281 key: test_recall value: [0.72727273 0.86363636 0.68181818 0.86363636 0.90909091 0.61904762 0.9047619 0.95238095 0.80952381 0.76190476] mean value: 0.8093073593073593 key: train_recall value: [0.88601036 0.88082902 0.89119171 0.86010363 0.88082902 0.90206186 0.87628866 0.87628866 0.90206186 0.90721649] mean value: 0.8862881256343144 key: test_roc_auc value: [0.66919192 0.6671123 0.6644385 0.87299465 0.77807487 0.64285714 0.75793651 0.6984127 0.79365079 0.82539683] mean value: 0.7370066208301502 key: train_roc_auc value: [0.84490392 0.83664092 0.88270277 0.82627823 0.84293023 0.84660055 0.83371395 0.84004306 0.85609422 0.87132977] mean value: 0.8481237618857027 key: test_jcc value: [0.55172414 0.61290323 0.53571429 0.79166667 0.71428571 0.48148148 0.67857143 0.64516129 0.68 0.69565217] mean value: 0.6387160404692687 key: train_jcc value: [0.76339286 0.75221239 0.80751174 0.73451327 0.75892857 0.77092511 0.74889868 0.75555556 0.78125 0.8 ] mean value: 0.7673188173479255 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.04330349 0.08664036 0.11152577 0.2121923 0.12195444 0.11751413 0.06918955 0.14634132 0.18964505 0.07947135] mean value: 0.11777777671813965 key: score_time value: [0.01222849 0.01216006 0.0142653 0.02318764 0.01702595 0.01708937 0.01801968 0.01644063 0.0302918 0.02245378] mean value: 0.018316268920898438 key: test_mcc value: [0.53796222 0.30666041 0.58824786 0.72451364 0.62770563 0.25490741 0.81701092 0.61748053 0.2270149 0.63732414] mean value: 0.5338827639400422 key: train_mcc value: [0.71583503 0.73666076 0.72106756 0.69510176 0.72092837 0.75196987 0.67442109 0.66929302 0.73643866 0.72106756] mean value: 0.7142783690459764 key: test_accuracy value: [0.76744186 0.65116279 0.79069767 0.86046512 0.81395349 0.62790698 0.90697674 0.79069767 0.60465116 0.81395349] mean value: 0.7627906976744185 key: train_accuracy value: [0.85788114 0.86821705 0.86046512 0.84754522 0.86046512 0.87596899 0.8372093 0.83462532 0.86821705 0.86046512] mean value: 0.8571059431524548 key: test_fscore value: [0.76190476 0.69387755 0.7804878 0.85714286 0.81818182 0.6 0.9 0.81632653 0.65306122 0.78947368] mean value: 0.7670456232440461 key: train_fscore value: [0.85639687 0.86614173 0.85863874 0.84754522 0.86010363 0.87692308 0.83804627 0.83419689 0.8688946 0.8622449 ] mean value: 0.8569131929270901 key: test_precision value: [0.8 0.62962963 0.84210526 0.9 0.81818182 0.63157895 0.94736842 0.71428571 0.57142857 0.88235294] mean value: 0.7736931306281152 key: train_precision value: [0.86315789 0.87765957 0.86772487 0.84536082 0.86010363 0.87244898 0.83589744 0.83854167 0.86666667 0.85353535] mean value: 0.8581096890973028 key: test_recall value: [0.72727273 0.77272727 0.72727273 0.81818182 0.81818182 0.57142857 0.85714286 0.95238095 0.76190476 0.71428571] mean value: 0.772077922077922 key: train_recall value: [0.84974093 0.85492228 0.84974093 0.84974093 0.86010363 0.8814433 0.84020619 0.82989691 0.87113402 0.87113402] mean value: 0.8558063137652904 key: test_roc_auc value: [0.76839827 0.6482684 0.79220779 0.86147186 0.81385281 0.62662338 0.90584416 0.79437229 0.60822511 0.81168831] mean value: 0.7630952380952382 key: train_roc_auc value: [0.85786016 0.86818279 0.86043748 0.84755088 0.86046418 0.87595481 0.83720154 0.83463757 0.8682095 0.86043748] mean value: 0.8570936381603547 key: test_jcc value: [0.61538462 0.53125 0.64 0.75 0.69230769 0.42857143 0.81818182 0.68965517 0.48484848 0.65217391] mean value: 0.630237312475131 key: train_jcc value: [0.74885845 0.76388889 0.75229358 0.73542601 0.75454545 0.78082192 0.72123894 0.71555556 0.76818182 0.75784753] mean value: 0.7498658141104166 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [3.27406001 2.85859323 2.05400014 3.12286806 2.90991402 2.67471933 3.62207961 3.69394207 2.33394408 1.94188452] mean value: 2.8486005067825317 key: score_time value: [0.01242566 0.01069951 0.01267076 0.01221442 0.02676773 0.01851606 0.02073812 0.02033973 0.02364993 0.01231551] mean value: 0.0170337438583374 key: test_mcc value: [0.53796222 0.35185603 0.50454827 0.62770563 0.58225108 0.25490741 0.86117339 0.55959928 0.16887427 0.62964308] mean value: 0.5078520662310667 key: train_mcc value: [0.78298032 0.84496021 0.82447378 0.64869633 0.7726435 0.80361626 0.63824048 0.63824048 0.67442978 0.59764154] mean value: 0.7225922669452046 key: test_accuracy value: [0.76744186 0.6744186 0.74418605 0.81395349 0.79069767 0.62790698 0.93023256 0.76744186 0.58139535 0.81395349] mean value: 0.7511627906976744 key: train_accuracy value: [0.89147287 0.92248062 0.9121447 0.82428941 0.88630491 0.90180879 0.81912145 0.81912145 0.8372093 0.79844961] mean value: 0.8612403100775194 key: test_fscore value: [0.76190476 0.70833333 0.71794872 0.81818182 0.79069767 0.6 0.92682927 0.79166667 0.60869565 0.8 ] mean value: 0.7524257892920498 key: train_fscore value: [0.890625 0.92227979 0.91282051 0.82198953 0.88541667 0.90206186 0.81958763 0.81958763 0.8372093 0.8040201 ] mean value: 0.861559801725926 key: test_precision value: [0.8 0.65384615 0.82352941 0.81818182 0.80952381 0.63157895 0.95 0.7037037 0.56 0.84210526] mean value: 0.7592469107546507 key: train_precision value: [0.89528796 0.92227979 0.9035533 0.83068783 0.89005236 0.90206186 0.81958763 0.81958763 0.83937824 0.78431373] mean value: 0.8606790314296683 key: test_recall value: [0.72727273 0.77272727 0.63636364 0.81818182 0.77272727 0.57142857 0.9047619 0.9047619 0.66666667 0.76190476] mean value: 0.7536796536796537 key: train_recall value: [0.88601036 0.92227979 0.92227979 0.8134715 0.88082902 0.90206186 0.81958763 0.81958763 0.83505155 0.82474227] mean value: 0.8625901394156295 key: test_roc_auc value: [0.76839827 0.67207792 0.74675325 0.81385281 0.79112554 0.62662338 0.92965368 0.77056277 0.58333333 0.81277056] mean value: 0.7515151515151515 key: train_roc_auc value: [0.89145879 0.9224801 0.91217082 0.82426152 0.8862908 0.90180813 0.81912024 0.81912024 0.83721489 0.7983815 ] mean value: 0.8612307034880615 key: test_jcc value: [0.61538462 0.5483871 0.56 0.69230769 0.65384615 0.42857143 0.86363636 0.65517241 0.4375 0.66666667] mean value: 0.6121472430980217 key: train_jcc value: [0.8028169 0.85576923 0.83962264 0.69777778 0.79439252 0.82159624 0.69432314 0.69432314 0.72 0.67226891] mean value: 0.7592890514733467 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.0136497 0.01171947 0.01167536 0.01198244 0.01169252 0.01167893 0.01165748 0.01183629 0.01165485 0.01159739] mean value: 0.011914443969726563 key: score_time value: [0.01215506 0.01054978 0.01036692 0.01034522 0.01050973 0.01032734 0.01033878 0.01041269 0.01039958 0.01040697] mean value: 0.010581207275390626 key: test_mcc value: [ 0.4912706 -0.08925021 0.44227524 0.2567 0.41330345 0.21351219 0.57282196 0.31757311 0.44054301 0.16485939] mean value: 0.322360873390043 key: train_mcc value: [0.3592453 0.43986881 0.39058415 0.373932 0.37829733 0.42490536 0.39988821 0.38683093 0.40827403 0.3976743 ] mean value: 0.3959500432696024 key: test_accuracy value: [0.74418605 0.46511628 0.72093023 0.62790698 0.69767442 0.60465116 0.74418605 0.62790698 0.69767442 0.58139535] mean value: 0.6511627906976745 key: train_accuracy value: [0.66925065 0.71317829 0.68475452 0.66666667 0.67958656 0.70542636 0.69250646 0.68475452 0.69767442 0.68992248] mean value: 0.6883720930232559 key: test_fscore value: [0.76595745 0.56603774 0.73913043 0.66666667 0.74509804 0.62222222 0.79245283 0.7037037 0.74509804 0.59090909] mean value: 0.6937276209561912 key: train_fscore value: [0.71555556 0.74364896 0.72767857 0.72727273 0.72197309 0.73972603 0.73015873 0.7264574 0.73226545 0.7309417 ] mean value: 0.7295678216085548 key: test_precision value: [0.72 0.48387097 0.70833333 0.61538462 0.65517241 0.58333333 0.65625 0.57575758 0.63333333 0.56521739] mean value: 0.6196652963981578 key: train_precision value: [0.62645914 0.67083333 0.63921569 0.61428571 0.63636364 0.66393443 0.65182186 0.64285714 0.65843621 0.6468254 ] mean value: 0.6451032556478061 key: test_recall value: [0.81818182 0.68181818 0.77272727 0.72727273 0.86363636 0.66666667 1. 0.9047619 0.9047619 0.61904762] mean value: 0.7958874458874459 key: train_recall value: [0.83419689 0.83419689 0.84455959 0.89119171 0.83419689 0.83505155 0.82989691 0.83505155 0.82474227 0.84020619] mean value: 0.8403290422520164 key: test_roc_auc value: [0.74242424 0.45995671 0.71969697 0.62554113 0.69372294 0.60606061 0.75 0.63419913 0.70238095 0.58225108] mean value: 0.6516233766233767 key: train_roc_auc value: [0.66967577 0.7134902 0.68516639 0.66724534 0.67998504 0.70509054 0.69215053 0.68436515 0.69734523 0.68953314] mean value: 0.6884047326531703 key: test_jcc value: [0.62068966 0.39473684 0.5862069 0.5 0.59375 0.4516129 0.65625 0.54285714 0.59375 0.41935484] mean value: 0.5359208278622027 key: train_jcc value: [0.55709343 0.59191176 0.57192982 0.57142857 0.56491228 0.58695652 0.575 0.57042254 0.57761733 0.57597173] mean value: 0.5743243983922165 MCC on Blind test: 0.26 Accuracy on Blind test: 0.64 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01208615 0.0118084 0.0119195 0.01202059 0.01185799 0.01190257 0.01189137 0.01201367 0.01191545 0.01200342] mean value: 0.011941909790039062 key: score_time value: [0.01037812 0.01057816 0.01038575 0.01036143 0.01032662 0.01039433 0.01032615 0.01039243 0.01032019 0.0105021 ] mean value: 0.010396528244018554 key: test_mcc value: [0.4517935 0.02169203 0.40939224 0.4517935 0.35185603 0.11496773 0.81385281 0.44468651 0.2270149 0.30265778] mean value: 0.3589707032809893 key: train_mcc value: [0.41603013 0.4677316 0.4367638 0.44185674 0.44703809 0.44185674 0.43671867 0.44252245 0.43195073 0.45219272] mean value: 0.44146616665156013 key: test_accuracy value: [0.72093023 0.51162791 0.69767442 0.72093023 0.6744186 0.55813953 0.90697674 0.72093023 0.60465116 0.65116279] mean value: 0.6767441860465117 key: train_accuracy value: [0.70801034 0.73385013 0.71834625 0.72093023 0.72351421 0.72093023 0.71834625 0.72093023 0.71576227 0.72609819] mean value: 0.720671834625323 key: test_fscore value: [0.7 0.53333333 0.66666667 0.7 0.70833333 0.53658537 0.9047619 0.72727273 0.65306122 0.61538462] mean value: 0.6745399171096035 key: train_fscore value: [0.70801034 0.7310705 0.71979434 0.72020725 0.72351421 0.72164948 0.72122762 0.71428571 0.71052632 0.72680412] mean value: 0.7197089902052173 key: test_precision value: [0.77777778 0.52173913 0.76470588 0.77777778 0.65384615 0.55 0.9047619 0.69565217 0.57142857 0.66666667] mean value: 0.6884356038959619 key: train_precision value: [0.70618557 0.73684211 0.71428571 0.72020725 0.72164948 0.72164948 0.71573604 0.73369565 0.72580645 0.72680412] mean value: 0.722286187762465 key: test_recall value: [0.63636364 0.54545455 0.59090909 0.63636364 0.77272727 0.52380952 0.9047619 0.76190476 0.76190476 0.57142857] mean value: 0.6705627705627706 key: train_recall value: [0.70984456 0.7253886 0.7253886 0.72020725 0.7253886 0.72164948 0.72680412 0.69587629 0.69587629 0.72680412] mean value: 0.7173227925858662 key: test_roc_auc value: [0.72294372 0.51082251 0.70021645 0.72294372 0.67207792 0.55735931 0.90692641 0.72186147 0.60822511 0.64935065] mean value: 0.6772727272727272 key: train_roc_auc value: [0.70801506 0.73382832 0.7183644 0.72092837 0.72351904 0.72092837 0.71832434 0.72099514 0.71581379 0.72609636] mean value: 0.7206813204422841 key: test_jcc value: [0.53846154 0.36363636 0.5 0.53846154 0.5483871 0.36666667 0.82608696 0.57142857 0.48484848 0.44444444] mean value: 0.518242166124354 key: train_jcc value: [0.548 0.57613169 0.562249 0.56275304 0.56680162 0.56451613 0.564 0.55555556 0.55102041 0.5708502 ] mean value: 0.5621877634277408 MCC on Blind test: 0.32 Accuracy on Blind test: 0.67 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01142979 0.0113678 0.01118851 0.01128006 0.01139712 0.01154923 0.01103115 0.01123428 0.0113554 0.01163101] mean value: 0.011346435546875 key: score_time value: [0.02301836 0.0221324 0.02117586 0.02700591 0.01861501 0.02525496 0.02225375 0.02616882 0.02647042 0.02867126] mean value: 0.024076676368713378 key: test_mcc value: [ 0.25541126 -0.02614435 0.12939849 0.44468651 0.4517935 -0.11404496 0.16233766 0.34848485 0.26318068 0.30265778] mean value: 0.22177614182156274 key: train_mcc value: [0.55559413 0.56087856 0.53050168 0.54551531 0.54025969 0.61240352 0.5507031 0.55039795 0.55564759 0.51951804] mean value: 0.5521419571533892 key: test_accuracy value: [0.62790698 0.48837209 0.55813953 0.72093023 0.72093023 0.44186047 0.58139535 0.6744186 0.62790698 0.65116279] mean value: 0.6093023255813953 key: train_accuracy value: [0.77777778 0.78036176 0.76485788 0.77260982 0.77002584 0.80620155 0.7751938 0.7751938 0.77777778 0.75968992] mean value: 0.775968992248062 key: test_fscore value: [0.63636364 0.52173913 0.48648649 0.71428571 0.7 0.47826087 0.57142857 0.66666667 0.65217391 0.61538462] mean value: 0.6042789603659169 key: train_fscore value: [0.77835052 0.77690289 0.75733333 0.7755102 0.77237852 0.80719794 0.77974684 0.7751938 0.78061224 0.76335878] mean value: 0.7766585057503326 key: test_precision value: [0.63636364 0.5 0.6 0.75 0.77777778 0.44 0.57142857 0.66666667 0.6 0.66666667] mean value: 0.6208903318903318 key: train_precision value: [0.77435897 0.78723404 0.78021978 0.7638191 0.76262626 0.80512821 0.76616915 0.77720207 0.77272727 0.75376884] mean value: 0.7743253704079895 key: test_recall value: [0.63636364 0.54545455 0.40909091 0.68181818 0.63636364 0.52380952 0.57142857 0.66666667 0.71428571 0.57142857] mean value: 0.5956709956709957 key: train_recall value: [0.78238342 0.76683938 0.7357513 0.78756477 0.78238342 0.80927835 0.79381443 0.77319588 0.78865979 0.77319588] mean value: 0.7793066609689653 key: test_roc_auc value: [0.62770563 0.48701299 0.56168831 0.72186147 0.72294372 0.44372294 0.58116883 0.67424242 0.62987013 0.64935065] mean value: 0.6099567099567099 key: train_roc_auc value: [0.77778965 0.78032691 0.76478286 0.77264836 0.77005769 0.80619358 0.77514556 0.77519897 0.77774959 0.75965493] mean value: 0.7759548101062977 key: test_jcc value: [0.46666667 0.35294118 0.32142857 0.55555556 0.53846154 0.31428571 0.4 0.5 0.48387097 0.44444444] mean value: 0.43776546350550144 key: train_jcc value: [0.6371308 0.63519313 0.60944206 0.63333333 0.62916667 0.67672414 0.63900415 0.63291139 0.64016736 0.61728395] mean value: 0.6350356989168522 MCC on Blind test: 0.2 Accuracy on Blind test: 0.61 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02410388 0.0238421 0.02404499 0.02403641 0.02398729 0.02347827 0.02385569 0.02381396 0.02365422 0.02424955] mean value: 0.023906636238098144 key: score_time value: [0.01353765 0.01356936 0.01376414 0.0136354 0.01343894 0.01350808 0.01371026 0.01338792 0.01352549 0.01347947] mean value: 0.013555669784545898 key: test_mcc value: [0.4517935 0.34859132 0.55959928 0.73471273 0.62770563 0.25490741 0.67462198 0.53796222 0.21351219 0.59541363] mean value: 0.49988198788408944 key: train_mcc value: [0.7371987 0.71733232 0.706524 0.6858155 0.69022752 0.71625569 0.69518417 0.6908848 0.73191874 0.72659748] mean value: 0.7097938919329834 key: test_accuracy value: [0.72093023 0.6744186 0.76744186 0.86046512 0.81395349 0.62790698 0.8372093 0.76744186 0.60465116 0.79069767] mean value: 0.7465116279069768 key: train_accuracy value: [0.86821705 0.85788114 0.85271318 0.84237726 0.84496124 0.85788114 0.84754522 0.84496124 0.86563307 0.8630491 ] mean value: 0.8545219638242894 key: test_fscore value: [0.7 0.69565217 0.73684211 0.85 0.81818182 0.6 0.82926829 0.77272727 0.62222222 0.75675676] mean value: 0.7381650641747198 key: train_fscore value: [0.86472149 0.85254692 0.848 0.83733333 0.84210526 0.85564304 0.84675325 0.84126984 0.86315789 0.86089239] mean value: 0.8512423414623245 key: test_precision value: [0.77777778 0.66666667 0.875 0.94444444 0.81818182 0.63157895 0.85 0.73913043 0.58333333 0.875 ] mean value: 0.776111342255507 key: train_precision value: [0.88586957 0.88333333 0.87362637 0.86263736 0.85561497 0.87165775 0.85340314 0.86413043 0.88172043 0.87700535] mean value: 0.8708998715932164 key: test_recall value: [0.63636364 0.72727273 0.63636364 0.77272727 0.81818182 0.57142857 0.80952381 0.80952381 0.66666667 0.66666667] mean value: 0.7114718614718615 key: train_recall value: [0.84455959 0.8238342 0.8238342 0.8134715 0.82901554 0.84020619 0.84020619 0.81958763 0.84536082 0.84536082] mean value: 0.8325436675391272 key: test_roc_auc value: [0.72294372 0.67316017 0.77056277 0.86255411 0.81385281 0.62662338 0.83658009 0.76839827 0.60606061 0.78787879] mean value: 0.7468614718614719 key: train_roc_auc value: [0.86815608 0.85779339 0.85263875 0.84230276 0.84492014 0.85792693 0.84756423 0.84502698 0.86568559 0.86309492] mean value: 0.8545109769777256 key: test_jcc value: [0.53846154 0.53333333 0.58333333 0.73913043 0.69230769 0.42857143 0.70833333 0.62962963 0.4516129 0.60869565] mean value: 0.5913409279152617 key: train_jcc value: [0.76168224 0.74299065 0.73611111 0.72018349 0.72727273 0.74770642 0.73423423 0.7260274 0.75925926 0.75576037] mean value: 0.7411227903254343 MCC on Blind test: 0.32 Accuracy on Blind test: 0.67 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [3.97078109 3.97416925 2.70587897 4.44103217 4.18598366 3.85128641 4.23357773 3.41070867 2.87078166 1.80410576] mean value: 3.5448305368423463 key: score_time value: [0.02938771 0.0144546 0.02191424 0.02190518 0.02358532 0.02501225 0.01258945 0.03111053 0.01438427 0.01988482] mean value: 0.021422839164733885 key: test_mcc value: [0.54609991 0.35185603 0.49456394 0.67532468 0.58134627 0.34848485 0.72077922 0.32463131 0.3030303 0.74914918] mean value: 0.5095265685481636 key: train_mcc value: [0.9638374 0.97427611 0.96899204 0.98450896 0.95865605 0.9741727 0.98450937 0.9741727 0.97417339 0.96899204] mean value: 0.9726290770221656 key: test_accuracy value: [0.76744186 0.6744186 0.74418605 0.8372093 0.79069767 0.6744186 0.86046512 0.65116279 0.65116279 0.86046512] mean value: 0.7511627906976744 key: train_accuracy value: [0.98191214 0.9870801 0.98449612 0.99224806 0.97932817 0.9870801 0.99224806 0.9870801 0.9870801 0.98449612] mean value: 0.9863049095607235 key: test_fscore value: [0.75 0.70833333 0.73170732 0.8372093 0.8 0.66666667 0.85714286 0.69387755 0.65116279 0.83333333] mean value: 0.7529433151593025 key: train_fscore value: [0.98191214 0.98694517 0.98445596 0.99220779 0.97927461 0.98714653 0.99224806 0.98714653 0.9870801 0.98453608] mean value: 0.9862952983546482 key: test_precision value: [0.83333333 0.65384615 0.78947368 0.85714286 0.7826087 0.66666667 0.85714286 0.60714286 0.63636364 1. ] mean value: 0.7683720741501062 key: train_precision value: [0.97938144 0.99473684 0.98445596 0.99479167 0.97927461 0.98461538 0.99481865 0.98461538 0.98963731 0.98453608] mean value: 0.9870863332273304 key: test_recall value: [0.68181818 0.77272727 0.68181818 0.81818182 0.81818182 0.66666667 0.85714286 0.80952381 0.66666667 0.71428571] mean value: 0.7487012987012986 key: train_recall value: [0.98445596 0.97927461 0.98445596 0.98963731 0.97927461 0.98969072 0.98969072 0.98969072 0.98453608 0.98453608] mean value: 0.9855242775492762 key: test_roc_auc value: [0.76948052 0.67207792 0.745671 0.83766234 0.79004329 0.67424242 0.86038961 0.6547619 0.65151515 0.85714286] mean value: 0.7512987012987014 key: train_roc_auc value: [0.9819187 0.98705999 0.98449602 0.99224133 0.97932803 0.98707334 0.99225469 0.98707334 0.98708669 0.98449602] mean value: 0.9863028150205652 key: test_jcc value: [0.6 0.5483871 0.57692308 0.72 0.66666667 0.5 0.75 0.53125 0.48275862 0.71428571] mean value: 0.6090271175339307 key: train_jcc value: [0.96446701 0.9742268 0.96938776 0.98453608 0.95939086 0.97461929 0.98461538 0.97461929 0.9744898 0.96954315] mean value: 0.972989541614236 MCC on Blind test: 0.29 Accuracy on Blind test: 0.64 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02931452 0.03400421 0.0351491 0.02511573 0.02355576 0.0299964 0.02605271 0.02237582 0.02702141 0.0285213 ] mean value: 0.028110694885253907 key: score_time value: [0.0128603 0.01022124 0.00969696 0.01189399 0.010005 0.00916767 0.00931692 0.01336789 0.00971723 0.00991654] mean value: 0.010616374015808106 key: test_mcc value: [0.44468651 0.34848485 0.44468651 0.54609991 0.81701092 0.21351219 0.723327 0.58824786 0.53463203 0.35868355] mean value: 0.5019371317009578 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.72093023 0.6744186 0.72093023 0.76744186 0.90697674 0.60465116 0.86046512 0.79069767 0.76744186 0.6744186 ] mean value: 0.7488372093023256 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.71428571 0.68181818 0.71428571 0.75 0.91304348 0.62222222 0.85 0.8 0.76190476 0.61111111] mean value: 0.7418671183888574 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 0.68181818 0.75 0.83333333 0.875 0.58333333 0.89473684 0.75 0.76190476 0.73333333] mean value: 0.7613459785828207 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.68181818 0.68181818 0.68181818 0.68181818 0.95454545 0.66666667 0.80952381 0.85714286 0.76190476 0.52380952] mean value: 0.7300865800865801 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.72186147 0.67424242 0.72186147 0.76948052 0.90584416 0.60606061 0.85930736 0.79220779 0.76731602 0.67099567] mean value: 0.7489177489177489 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.55555556 0.51724138 0.55555556 0.6 0.84 0.4516129 0.73913043 0.66666667 0.61538462 0.44 ] mean value: 0.5981147110481153 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.4 Accuracy on Blind test: 0.7 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.13478565 0.13071537 0.1360302 0.13517666 0.13477015 0.13016009 0.14609933 0.13371921 0.34426522 0.34663653] mean value: 0.17723584175109863 key: score_time value: [0.0183773 0.02117014 0.01810002 0.0189991 0.01942897 0.02047467 0.02483344 0.01833344 0.02377844 0.02443361] mean value: 0.02079291343688965 key: test_mcc value: [0.49456394 0.34848485 0.54609991 0.82901914 0.58134627 0.30151915 0.72077922 0.53796222 0.44155844 0.4912706 ] mean value: 0.529260373056181 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.74418605 0.6744186 0.76744186 0.90697674 0.79069767 0.65116279 0.86046512 0.76744186 0.72093023 0.74418605] mean value: 0.7627906976744185 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.73170732 0.68181818 0.75 0.9 0.8 0.63414634 0.85714286 0.77272727 0.71428571 0.71794872] mean value: 0.755977640245933 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.78947368 0.68181818 0.83333333 1. 0.7826087 0.65 0.85714286 0.73913043 0.71428571 0.77777778] mean value: 0.7825570679003173 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.68181818 0.68181818 0.68181818 0.81818182 0.81818182 0.61904762 0.85714286 0.80952381 0.71428571 0.66666667] mean value: 0.7348484848484849 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.745671 0.67424242 0.76948052 0.90909091 0.79004329 0.6504329 0.86038961 0.76839827 0.72077922 0.74242424] mean value: 0.763095238095238 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.57692308 0.51724138 0.6 0.81818182 0.66666667 0.46428571 0.75 0.62962963 0.55555556 0.56 ] mean value: 0.6138483840552806 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01684761 0.01725125 0.03084469 0.03082466 0.03178549 0.01522112 0.01549125 0.01581836 0.04029131 0.02556753] mean value: 0.0239943265914917 key: score_time value: [0.01382542 0.0134089 0.02095604 0.0213182 0.01256824 0.0129602 0.01235056 0.02648115 0.02352309 0.01250362] mean value: 0.016989541053771973 key: test_mcc value: [0.3030303 0.25490741 0.20995671 0.49456394 0.58134627 0.11982827 0.58225108 0.16485939 0.34859132 0.06695322] mean value: 0.31262879090530155 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.65116279 0.62790698 0.60465116 0.74418605 0.79069767 0.55813953 0.79069767 0.58139535 0.6744186 0.53488372] mean value: 0.6558139534883721 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.65116279 0.65217391 0.60465116 0.73170732 0.8 0.57777778 0.79069767 0.59090909 0.65 0.375 ] mean value: 0.6424079726710494 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.66666667 0.625 0.61904762 0.78947368 0.7826087 0.54166667 0.77272727 0.56521739 0.68421053 0.54545455] mean value: 0.6592073068045607 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.63636364 0.68181818 0.59090909 0.68181818 0.81818182 0.61904762 0.80952381 0.61904762 0.61904762 0.28571429] mean value: 0.6361471861471861 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.65151515 0.62662338 0.60497835 0.745671 0.79004329 0.55952381 0.79112554 0.58225108 0.67316017 0.52922078] mean value: 0.6554112554112554 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.48275862 0.48387097 0.43333333 0.57692308 0.66666667 0.40625 0.65384615 0.41935484 0.48148148 0.23076923] mean value: 0.4835254370161211 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.24 Accuracy on Blind test: 0.61 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [2.97463274 1.71187091 1.76081896 1.68633628 1.68973565 2.1227293 4.62051916 2.92394114 2.63578987 2.66768122] mean value: 2.4794055223464966 key: score_time value: [0.09138441 0.09405994 0.09149313 0.09162092 0.09041667 0.24971867 0.25352859 0.14864159 0.12520003 0.1521244 ] mean value: 0.1388188362121582 key: test_mcc value: [0.58824786 0.40291148 0.67532468 0.61748053 0.72077922 0.34859132 0.81701092 0.73471273 0.53463203 0.63732414] mean value: 0.6077014905508663 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.79069767 0.69767442 0.8372093 0.79069767 0.86046512 0.6744186 0.90697674 0.86046512 0.76744186 0.81395349] mean value: 0.8 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.7804878 0.73469388 0.8372093 0.75675676 0.86363636 0.65 0.9 0.86956522 0.76190476 0.78947368] mean value: 0.7943727768654364 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.84210526 0.66666667 0.85714286 0.93333333 0.86363636 0.68421053 0.94736842 0.8 0.76190476 0.88235294] mean value: 0.8238721134386768 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 0.81818182 0.81818182 0.63636364 0.86363636 0.61904762 0.85714286 0.95238095 0.76190476 0.71428571] mean value: 0.7768398268398269 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.79220779 0.69480519 0.83766234 0.79437229 0.86038961 0.67316017 0.90584416 0.86255411 0.76731602 0.81168831] mean value: 0.8 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.64 0.58064516 0.72 0.60869565 0.76 0.48148148 0.81818182 0.76923077 0.61538462 0.65217391] mean value: 0.6645793410786398 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.72556663 1.91589737 1.26766062 1.03932428 0.95912004 1.64210296 1.90016818 2.0203793 1.93958759 1.73655081] mean value: 1.6146357774734497 key: score_time value: [0.17945695 0.17857099 0.13251996 0.22311735 0.22053123 0.17586827 0.22016335 0.18613434 0.22368526 0.18402171] mean value: 0.19240694046020507 key: test_mcc value: [0.53463203 0.45629995 0.76839827 0.61748053 0.76789769 0.39696419 0.81701092 0.69486034 0.48807056 0.53595916] mean value: 0.6077573644581136 key: train_mcc value: [0.90182148 0.88142257 0.90717492 0.88114951 0.8914855 0.89664014 0.89158365 0.8914855 0.89158365 0.88123732] mean value: 0.8915584237957668 key: test_accuracy value: [0.76744186 0.72093023 0.88372093 0.79069767 0.88372093 0.69767442 0.90697674 0.8372093 0.74418605 0.76744186] mean value: 0.8 key: train_accuracy value: [0.95090439 0.94056848 0.95348837 0.94056848 0.94573643 0.94832041 0.94573643 0.94573643 0.94573643 0.94056848] mean value: 0.9457364341085271 key: test_fscore value: [0.77272727 0.76 0.88372093 0.75675676 0.88888889 0.66666667 0.9 0.85106383 0.73170732 0.75 ] mean value: 0.7961531662132548 key: train_fscore value: [0.95090439 0.93963255 0.95384615 0.94056848 0.94573643 0.94845361 0.94545455 0.94573643 0.94545455 0.94117647] mean value: 0.945696360595677 key: test_precision value: [0.77272727 0.67857143 0.9047619 0.93333333 0.86956522 0.72222222 0.94736842 0.76923077 0.75 0.78947368] mean value: 0.8137254253501394 key: train_precision value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [0.94845361 0.95212766 0.94416244 0.93814433 0.94329897 0.94845361 0.95287958 0.94818653 0.95287958 0.93401015] mean value: 0.9462596454671948 key: test_recall value: [0.77272727 0.86363636 0.86363636 0.63636364 0.90909091 0.61904762 0.85714286 0.95238095 0.71428571 0.71428571] mean value: 0.7902597402597402 key: train_recall value: [0.95336788 0.92746114 0.96373057 0.94300518 0.94818653 0.94845361 0.93814433 0.94329897 0.93814433 0.94845361] mean value: 0.9452246140697612 key: test_roc_auc value: [0.76731602 0.71753247 0.88419913 0.79437229 0.88311688 0.69588745 0.90584416 0.83982684 0.74350649 0.76623377] mean value: 0.7997835497835498 key: train_roc_auc value: [0.95091074 0.94053469 0.95351477 0.94057476 0.94574275 0.94832007 0.9457561 0.94574275 0.9457561 0.94054805] mean value: 0.945740077987287 key: test_jcc value: [0.62962963 0.61290323 0.79166667 0.60869565 0.8 0.5 0.81818182 0.74074074 0.57692308 0.6 ] mean value: 0.6678740810122297 key: train_jcc value: [0.90640394 0.88613861 0.91176471 0.88780488 0.89705882 0.90196078 0.89655172 0.89705882 0.89655172 0.88888889] mean value: 0.897018290721652 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.03860426 0.03471041 0.02413154 0.01475096 0.01522779 0.01514721 0.01484656 0.01524067 0.01517296 0.0147686 ] mean value: 0.020260095596313477 key: score_time value: [0.02451944 0.02949691 0.01293755 0.01273441 0.01294351 0.01272106 0.01293421 0.01299977 0.01297522 0.01269722] mean value: 0.015695929527282715 key: test_mcc value: [0.4517935 0.02169203 0.40939224 0.4517935 0.35185603 0.11496773 0.81385281 0.44468651 0.2270149 0.30265778] mean value: 0.3589707032809893 key: train_mcc value: [0.41603013 0.4677316 0.4367638 0.44185674 0.44703809 0.44185674 0.43671867 0.44252245 0.43195073 0.45219272] mean value: 0.44146616665156013 key: test_accuracy value: [0.72093023 0.51162791 0.69767442 0.72093023 0.6744186 0.55813953 0.90697674 0.72093023 0.60465116 0.65116279] mean value: 0.6767441860465117 key: train_accuracy value: [0.70801034 0.73385013 0.71834625 0.72093023 0.72351421 0.72093023 0.71834625 0.72093023 0.71576227 0.72609819] mean value: 0.720671834625323 key: test_fscore value: [0.7 0.53333333 0.66666667 0.7 0.70833333 0.53658537 0.9047619 0.72727273 0.65306122 0.61538462] mean value: 0.6745399171096035 key: train_fscore value: [0.70801034 0.7310705 0.71979434 0.72020725 0.72351421 0.72164948 0.72122762 0.71428571 0.71052632 0.72680412] mean value: 0.7197089902052173 key: test_precision value: [0.77777778 0.52173913 0.76470588 0.77777778 0.65384615 0.55 0.9047619 0.69565217 0.57142857 0.66666667] mean value: 0.6884356038959619 key: train_precision value: [0.70618557 0.73684211 0.71428571 0.72020725 0.72164948 0.72164948 0.71573604 0.73369565 0.72580645 0.72680412] mean value: 0.722286187762465 key: test_recall value: [0.63636364 0.54545455 0.59090909 0.63636364 0.77272727 0.52380952 0.9047619 0.76190476 0.76190476 0.57142857] mean value: 0.6705627705627706 key: train_recall value: [0.70984456 0.7253886 0.7253886 0.72020725 0.7253886 0.72164948 0.72680412 0.69587629 0.69587629 0.72680412] mean value: 0.7173227925858662 key: test_roc_auc value: [0.72294372 0.51082251 0.70021645 0.72294372 0.67207792 0.55735931 0.90692641 0.72186147 0.60822511 0.64935065] mean value: 0.6772727272727272 key: train_roc_auc value: [0.70801506 0.73382832 0.7183644 0.72092837 0.72351904 0.72092837 0.71832434 0.72099514 0.71581379 0.72609636] mean value: 0.7206813204422841 key: test_jcc value: [0.53846154 0.36363636 0.5 0.53846154 0.5483871 0.36666667 0.82608696 0.57142857 0.48484848 0.44444444] mean value: 0.518242166124354 key: train_jcc value: [0.548 0.57613169 0.562249 0.56275304 0.56680162 0.56451613 0.564 0.55555556 0.55102041 0.5708502 ] mean value: 0.5621877634277408 MCC on Blind test: 0.32 Accuracy on Blind test: 0.67 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [5.84490728 7.99645638 8.01863933 8.01061749 6.95634055 3.86555934 2.4676311 2.49709249 2.40035462 2.49079871] mean value: 5.054839730262756 key: score_time value: [0.01869607 0.02224588 0.02842999 0.02292418 0.0223074 0.01271367 0.01389503 0.01313758 0.01270461 0.01333499] mean value: 0.0180389404296875 key: test_mcc value: [0.53463203 0.53595916 0.76839827 0.61748053 0.9544491 0.44468651 0.90692641 0.68193178 0.62770563 0.77418983] mean value: 0.6846359249520138 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.76744186 0.76744186 0.88372093 0.79069767 0.97674419 0.72093023 0.95348837 0.8372093 0.81395349 0.88372093] mean value: 0.8395348837209302 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.77272727 0.7826087 0.88372093 0.75675676 0.97777778 0.72727273 0.95238095 0.84444444 0.80952381 0.87179487] mean value: 0.8379008238563345 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.77272727 0.75 0.9047619 0.93333333 0.95652174 0.69565217 0.95238095 0.79166667 0.80952381 0.94444444] mean value: 0.8511012296881862 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.77272727 0.81818182 0.86363636 0.63636364 1. 0.76190476 0.95238095 0.9047619 0.80952381 0.80952381] mean value: 0.8329004329004329 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.76731602 0.76623377 0.88419913 0.79437229 0.97619048 0.72186147 0.9534632 0.83874459 0.81385281 0.88203463] mean value: 0.8398268398268398 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.62962963 0.64285714 0.79166667 0.60869565 0.95652174 0.57142857 0.90909091 0.73076923 0.68 0.77272727] mean value: 0.7293386814473771 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.59 Accuracy on Blind test: 0.79 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04299521 0.08748269 0.06289506 0.08957219 0.08889985 0.09818339 0.08479619 0.08244371 0.0967896 0.08396053] mean value: 0.08180184364318847 key: score_time value: [0.01770806 0.02164626 0.01231289 0.02033544 0.02275109 0.01549911 0.02078938 0.02091646 0.0211823 0.02512574] mean value: 0.019826674461364747 key: test_mcc value: [0.44468651 0.26856633 0.36709713 0.67532468 0.58557701 0.06753957 0.67532468 0.54609991 0.34848485 0.53595916] mean value: 0.45146598115905806 key: train_mcc value: [0.78838964 0.80887382 0.79846162 0.75718561 0.75711768 0.84496021 0.7726435 0.78310005 0.82954911 0.76744745] mean value: 0.7907728681165043 key: test_accuracy value: [0.72093023 0.62790698 0.6744186 0.8372093 0.79069767 0.53488372 0.8372093 0.76744186 0.6744186 0.76744186] mean value: 0.7232558139534884 key: train_accuracy value: [0.89405685 0.90439276 0.89922481 0.87855297 0.87855297 0.92248062 0.88630491 0.89147287 0.91472868 0.88372093] mean value: 0.8953488372093024 key: test_fscore value: [0.71428571 0.69230769 0.63157895 0.8372093 0.80851064 0.5 0.8372093 0.7826087 0.66666667 0.75 ] mean value: 0.7220376959229704 key: train_fscore value: [0.89514066 0.90339426 0.89922481 0.8772846 0.87855297 0.92268041 0.88717949 0.89285714 0.91560102 0.88431877] mean value: 0.8956234125406854 key: test_precision value: [0.75 0.6 0.75 0.85714286 0.76 0.52631579 0.81818182 0.72 0.66666667 0.78947368] mean value: 0.7237780815675552 key: train_precision value: [0.88383838 0.91052632 0.89690722 0.88421053 0.87628866 0.92268041 0.88265306 0.88383838 0.90862944 0.88205128] mean value: 0.8931623683341963 key: test_recall value: [0.68181818 0.81818182 0.54545455 0.81818182 0.86363636 0.47619048 0.85714286 0.85714286 0.66666667 0.71428571] mean value: 0.7298701298701299 key: train_recall value: [0.90673575 0.89637306 0.9015544 0.87046632 0.88082902 0.92268041 0.89175258 0.90206186 0.92268041 0.88659794] mean value: 0.8981731745099086 key: test_roc_auc value: [0.72186147 0.62337662 0.67748918 0.83766234 0.78896104 0.53354978 0.83766234 0.76948052 0.67424242 0.76623377] mean value: 0.7230519480519481 key: train_roc_auc value: [0.89408953 0.9043721 0.89923081 0.87853213 0.87855884 0.9224801 0.8862908 0.89144544 0.91470808 0.88371348] mean value: 0.8953421291597671 key: test_jcc value: [0.55555556 0.52941176 0.46153846 0.72 0.67857143 0.33333333 0.72 0.64285714 0.5 0.6 ] mean value: 0.5741267686561804 key: train_jcc value: [0.81018519 0.82380952 0.81690141 0.78139535 0.78341014 0.85645933 0.79723502 0.80645161 0.84433962 0.79262673] mean value: 0.811281392137182 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01140189 0.01118898 0.01079679 0.01073027 0.01068997 0.0107522 0.01070142 0.01034212 0.00967431 0.01074028] mean value: 0.010701823234558105 key: score_time value: [0.01017404 0.00979447 0.00948381 0.00952983 0.00951076 0.00952482 0.00906324 0.0088284 0.008775 0.00955725] mean value: 0.009424161911010743 key: test_mcc value: [ 0.68193178 -0.03178209 0.53463203 0.34848485 0.31401826 0.30151915 0.71509694 0.38097804 0.23794034 0.16485939] mean value: 0.3647678693597039 key: train_mcc value: [0.35534379 0.42731074 0.35464096 0.38612423 0.39519578 0.38066533 0.35342511 0.38859133 0.42122023 0.40584961] mean value: 0.38683671255257734 key: test_accuracy value: [0.8372093 0.48837209 0.76744186 0.6744186 0.65116279 0.65116279 0.8372093 0.6744186 0.60465116 0.58139535] mean value: 0.6767441860465117 key: train_accuracy value: [0.6744186 0.71059432 0.6744186 0.68992248 0.69509044 0.6873385 0.6744186 0.69250646 0.70801034 0.7002584 ] mean value: 0.6906976744186046 key: test_fscore value: [0.82926829 0.56 0.77272727 0.68181818 0.70588235 0.63414634 0.85714286 0.72 0.66666667 0.59090909] mean value: 0.7018561056351588 key: train_fscore value: [0.7014218 0.73205742 0.7 0.71428571 0.71634615 0.71394799 0.7 0.71325301 0.73031026 0.72380952] mean value: 0.7145431874278962 key: test_precision value: [0.89473684 0.5 0.77272727 0.68181818 0.62068966 0.65 0.75 0.62068966 0.56666667 0.56521739] mean value: 0.6622545664966559 key: train_precision value: [0.64628821 0.68 0.64757709 0.66079295 0.66816143 0.65938865 0.65044248 0.66968326 0.68 0.67256637] mean value: 0.6634900442401712 key: test_recall value: [0.77272727 0.63636364 0.77272727 0.68181818 0.81818182 0.61904762 1. 0.85714286 0.80952381 0.61904762] mean value: 0.7586580086580087 key: train_recall value: [0.76683938 0.79274611 0.76165803 0.77720207 0.77202073 0.77835052 0.75773196 0.7628866 0.78865979 0.78350515] mean value: 0.774160034186208 key: test_roc_auc value: [0.83874459 0.48484848 0.76731602 0.67424242 0.64718615 0.6504329 0.84090909 0.67857143 0.60930736 0.58225108] mean value: 0.6773809523809524 key: train_roc_auc value: [0.6746568 0.71080605 0.67464345 0.69014743 0.69528871 0.68710272 0.67420277 0.69232413 0.7078014 0.70004273] mean value: 0.690701618503285 key: test_jcc value: [0.70833333 0.38888889 0.62962963 0.51724138 0.54545455 0.46428571 0.75 0.5625 0.5 0.41935484] mean value: 0.5485688329612134 key: train_jcc value: [0.54014599 0.57735849 0.53846154 0.55555556 0.55805243 0.55514706 0.53846154 0.55430712 0.57518797 0.56716418] mean value: 0.5559841866860746 MCC on Blind test: 0.26 Accuracy on Blind test: 0.64 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01252246 0.01743793 0.01833606 0.03522992 0.01919341 0.02123332 0.02283287 0.01666498 0.02263021 0.02485061] mean value: 0.021093177795410156 key: score_time value: [0.00881982 0.011379 0.01184511 0.01217556 0.01201034 0.01208806 0.01207376 0.01197124 0.01207709 0.01200581] mean value: 0.011644577980041504 key: test_mcc value: [0.3543982 0.32779278 0.49347 0.63123793 0.369787 0.30666041 0.67988342 0.39343507 0.29669666 0.48934219] mean value: 0.4342703657405515 key: train_mcc value: [0.33082727 0.45943233 0.48092172 0.71620269 0.49709833 0.73700447 0.6802225 0.47378378 0.70801498 0.43564422] mean value: 0.551915229904182 key: test_accuracy value: [0.60465116 0.60465116 0.72093023 0.81395349 0.65116279 0.65116279 0.8372093 0.62790698 0.62790698 0.69767442] mean value: 0.6837209302325582 key: train_accuracy value: [0.5994832 0.67700258 0.6873385 0.85788114 0.69767442 0.8630491 0.8372093 0.68992248 0.84754522 0.65891473] mean value: 0.7416020671834626 key: test_fscore value: [0.37037037 0.72131148 0.77777778 0.80952381 0.73684211 0.59459459 0.82051282 0.72413793 0.69230769 0.55172414] mean value: 0.6799102714725577 key: train_fscore value: [0.32900433 0.75442043 0.76134122 0.85488127 0.76739563 0.85070423 0.82644628 0.76190476 0.86117647 0.484375 ] mean value: 0.7251649615674208 key: test_precision value: [1. 0.56410256 0.65625 0.85 0.6 0.6875 0.88888889 0.56756757 0.58064516 1. ] mean value: 0.7394954181849343 key: train_precision value: [1. 0.60759494 0.61464968 0.87096774 0.62258065 0.9378882 0.88757396 0.61935484 0.79220779 1. ] mean value: 0.7952817799506573 key: test_recall value: [0.22727273 1. 0.95454545 0.77272727 0.95454545 0.52380952 0.76190476 1. 0.85714286 0.38095238] mean value: 0.7432900432900433 key: train_recall value: [0.19689119 0.99481865 1. 0.83937824 1. 0.77835052 0.77319588 0.98969072 0.94329897 0.31958763] mean value: 0.7835211794241761 key: test_roc_auc value: [0.61363636 0.5952381 0.71536797 0.81493506 0.64393939 0.6482684 0.83549784 0.63636364 0.63311688 0.69047619] mean value: 0.6826839826839827 key: train_roc_auc value: [0.5984456 0.6778217 0.68814433 0.85783345 0.69845361 0.86326852 0.83737514 0.68914588 0.84729715 0.65979381] mean value: 0.7417579189145879 key: test_jcc value: [0.22727273 0.56410256 0.63636364 0.68 0.58333333 0.42307692 0.69565217 0.56756757 0.52941176 0.38095238] mean value: 0.5287733071288059 key: train_jcc value: [0.19689119 0.60567823 0.61464968 0.74654378 0.62258065 0.74019608 0.70422535 0.61538462 0.75619835 0.31958763] mean value: 0.5921935552542208 MCC on Blind test: 0.24 Accuracy on Blind test: 0.61 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02301335 0.02788758 0.02236009 0.0314877 0.01695895 0.02369475 0.02375484 0.02350283 0.02108383 0.01976204] mean value: 0.02335059642791748 key: score_time value: [0.01013947 0.01235223 0.01203251 0.01211309 0.01198268 0.02014351 0.01239824 0.01431727 0.01251864 0.01279259] mean value: 0.013079023361206055 key: test_mcc value: [0.43082022 0.30666041 0.50454827 0.41223987 0.369787 0.46619277 0.8276362 0.43082022 0.31757311 0.67462198] mean value: 0.4740900049490931 key: train_mcc value: [0.55190616 0.76754187 0.76829956 0.41878762 0.5091592 0.62622568 0.67412433 0.62438608 0.6676895 0.71275592] mean value: 0.6320875933190009 key: test_accuracy value: [0.65116279 0.65116279 0.74418605 0.6744186 0.65116279 0.69767442 0.90697674 0.65116279 0.62790698 0.8372093 ] mean value: 0.7093023255813954 key: train_accuracy value: [0.73385013 0.88372093 0.88372093 0.64857881 0.70542636 0.7881137 0.82687339 0.78294574 0.81395349 0.84754522] mean value: 0.7914728682170542 key: test_fscore value: [0.48275862 0.69387755 0.71794872 0.75 0.73684211 0.75471698 0.89473684 0.73684211 0.7037037 0.82926829] mean value: 0.7300694919809066 key: train_fscore value: [0.6360424 0.88431877 0.88607595 0.7394636 0.772 0.82327586 0.80351906 0.8212766 0.84140969 0.86310905] mean value: 0.8070490979544427 key: test_precision value: [1. 0.62962963 0.82352941 0.61764706 0.6 0.625 1. 0.58333333 0.57575758 0.85 ] mean value: 0.7304897009308774 key: train_precision value: [1. 0.87755102 0.86633663 0.58662614 0.6286645 0.70740741 0.93197279 0.69927536 0.73461538 0.78481013] mean value: 0.7817259359042723 key: test_recall value: [0.31818182 0.77272727 0.63636364 0.95454545 0.95454545 0.95238095 0.80952381 1. 0.9047619 0.80952381] mean value: 0.8112554112554112 key: train_recall value: [0.46632124 0.89119171 0.90673575 1. 1. 0.98453608 0.70618557 0.99484536 0.98453608 0.95876289] mean value: 0.8893114684044656 key: test_roc_auc value: [0.65909091 0.6482684 0.74675325 0.66774892 0.64393939 0.7034632 0.9047619 0.65909091 0.63419913 0.83658009] mean value: 0.7103896103896105 key: train_roc_auc value: [0.73316062 0.88374018 0.88378025 0.64948454 0.70618557 0.78760483 0.82718605 0.78239677 0.81351156 0.84725709] mean value: 0.7914307462208215 key: test_jcc value: [0.31818182 0.53125 0.56 0.6 0.58333333 0.60606061 0.80952381 0.58333333 0.54285714 0.70833333] mean value: 0.5842873376623376 key: train_jcc value: [0.46632124 0.79262673 0.79545455 0.58662614 0.6286645 0.6996337 0.67156863 0.6967509 0.72623574 0.75918367] mean value: 0.6823065796546106 MCC on Blind test: 0.41 Accuracy on Blind test: 0.7 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.19457316 0.17522335 0.17240524 0.16780663 0.16347218 0.16430736 0.16252875 0.16236234 0.16486979 0.16400599] mean value: 0.16915547847747803 key: score_time value: [0.01554251 0.01645303 0.01652455 0.01542091 0.01534796 0.0151906 0.01580358 0.01565909 0.01523995 0.01531696] mean value: 0.015649914741516113 key: test_mcc value: [0.54609991 0.20824344 0.53796222 0.4517935 0.81385281 0.34848485 0.81385281 0.4633482 0.35141081 0.76789769] mean value: 0.5302946234932158 key: train_mcc value: [0.92259409 0.93803584 0.92769572 0.96383644 0.93798408 0.94832007 0.90716502 0.92249139 0.95360082 0.91731211] mean value: 0.9339035586204503 key: test_accuracy value: [0.76744186 0.60465116 0.76744186 0.72093023 0.90697674 0.6744186 0.90697674 0.72093023 0.6744186 0.88372093] mean value: 0.7627906976744185 key: train_accuracy value: [0.96124031 0.96899225 0.96382429 0.98191214 0.96899225 0.97416021 0.95348837 0.96124031 0.97674419 0.95865633] mean value: 0.9669250645994832 key: test_fscore value: [0.75 0.62222222 0.76190476 0.7 0.90909091 0.66666667 0.9047619 0.75 0.68181818 0.87804878] mean value: 0.7624513426952452 key: train_fscore value: [0.96143959 0.96907216 0.96354167 0.98181818 0.96891192 0.9742268 0.95408163 0.96143959 0.9769821 0.95876289] mean value: 0.9670276528471051 key: test_precision value: [0.83333333 0.60869565 0.8 0.77777778 0.90909091 0.66666667 0.9047619 0.66666667 0.65217391 0.9 ] mean value: 0.7719166823514649 key: train_precision value: [0.95408163 0.96410256 0.96858639 0.984375 0.96891192 0.9742268 0.94444444 0.95897436 0.96954315 0.95876289] mean value: 0.9646009142637201 key: test_recall value: [0.68181818 0.63636364 0.72727273 0.63636364 0.90909091 0.66666667 0.9047619 0.85714286 0.71428571 0.85714286] mean value: 0.759090909090909 key: train_recall value: [0.96891192 0.97409326 0.95854922 0.97927461 0.96891192 0.9742268 0.96391753 0.96391753 0.98453608 0.95876289] mean value: 0.9695101757384755 key: test_roc_auc value: [0.76948052 0.6038961 0.76839827 0.72294372 0.90692641 0.67424242 0.90692641 0.72402597 0.67532468 0.88311688] mean value: 0.7635281385281385 key: train_roc_auc value: [0.96126008 0.9690054 0.96381069 0.98190535 0.96899204 0.97416003 0.95346135 0.96123337 0.976724 0.95865605] mean value: 0.966920837562096 key: test_jcc value: [0.6 0.4516129 0.61538462 0.53846154 0.83333333 0.5 0.82608696 0.6 0.51724138 0.7826087 ] mean value: 0.6264729421889551 key: train_jcc value: [0.92574257 0.94 0.92964824 0.96428571 0.93969849 0.94974874 0.91219512 0.92574257 0.955 0.92079208] mean value: 0.9362853541346641 MCC on Blind test: 0.41 Accuracy on Blind test: 0.7 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.06695819 0.07240248 0.07764935 0.07833457 0.09272051 0.08303666 0.08197641 0.06399465 0.06653547 0.09358835] mean value: 0.07771966457366944 key: score_time value: [0.01832271 0.02029538 0.02448392 0.03695583 0.02700496 0.02261472 0.02670813 0.02114224 0.02746582 0.0180223 ] mean value: 0.024301600456237794 key: test_mcc value: [0.61748053 0.3030303 0.68193178 0.64040632 0.81778934 0.44701207 0.81701092 0.67462198 0.48807056 0.69166471] mean value: 0.6179018506275636 key: train_mcc value: [0.96919751 0.96414361 0.95885876 0.95865605 0.96445208 0.97938089 0.94878037 0.95380961 0.95870837 0.95452645] mean value: 0.9610513714125107 key: test_accuracy value: [0.79069767 0.65116279 0.8372093 0.81395349 0.90697674 0.72093023 0.90697674 0.8372093 0.74418605 0.8372093 ] mean value: 0.8046511627906977 key: train_accuracy value: [0.98449612 0.98191214 0.97932817 0.97932817 0.98191214 0.98966408 0.97416021 0.97674419 0.97932817 0.97674419] mean value: 0.9803617571059432 key: test_fscore value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] [0.75675676 0.65116279 0.82926829 0.8 0.9047619 0.68421053 0.9 0.82926829 0.73170732 0.81081081] mean value: 0.7897946691781961 key: train_fscore value: [0.98429319 0.9816273 0.97905759 0.97927461 0.98153034 0.98963731 0.97382199 0.97650131 0.97927461 0.9762533 ] mean value: 0.9801271546598425 key: test_precision value: [0.93333333 0.66666667 0.89473684 0.88888889 0.95 0.76470588 0.94736842 0.85 0.75 0.9375 ] mean value: 0.8583200034399725 key: train_precision value: [0.99470899 0.99468085 0.98941799 0.97927461 1. 0.99479167 0.9893617 0.98941799 0.984375 1. ] mean value: 0.9916028804802093 key: test_recall value: [0.63636364 0.63636364 0.77272727 0.72727273 0.86363636 0.61904762 0.85714286 0.80952381 0.71428571 0.71428571] mean value: 0.7350649350649351 key: train_recall value: [0.97409326 0.96891192 0.96891192 0.97927461 0.96373057 0.98453608 0.95876289 0.96391753 0.9742268 0.95360825] mean value: 0.9689973826184499 key: test_roc_auc value: [0.79437229 0.65151515 0.83874459 0.81601732 0.90800866 0.71861472 0.90584416 0.83658009 0.74350649 0.83441558] mean value: 0.8047619047619048 key: train_roc_auc value: [0.98446931 0.98187864 0.97930132 0.97932803 0.98186528 0.98967737 0.9742001 0.97677742 0.97934138 0.97680412] mean value: 0.9803642967790182 key: test_jcc value: [0.60869565 0.48275862 0.70833333 0.66666667 0.82608696 0.52 0.81818182 0.70833333 0.57692308 0.68181818] mean value: 0.6597797639641718 key: train_jcc value: [0.96907216 0.96391753 0.95897436 0.95939086 0.96373057 0.97948718 0.94897959 0.95408163 0.95939086 0.95360825] mean value: 0.9610632996932176 MCC on Blind test: 0.28 Accuracy on Blind test: 0.64 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.07039952 0.12239099 0.19194889 0.21874523 0.19765306 0.18567944 0.16427875 0.17072296 0.14521766 0.16551971] mean value: 0.16325562000274657 key: score_time value: [0.0144248 0.03488755 0.02736664 0.02442789 0.02979183 0.02604699 0.02822995 0.02328968 0.02602267 0.02613354] mean value: 0.026062154769897462 key: test_mcc value: [0.21351219 0.26106714 0.40088002 0.49456394 0.53796222 0.02380952 0.3071961 0.25541126 0.11688312 0.39696419] mean value: 0.3008249695754552 key: train_mcc value: [0.97427611 0.97427611 0.97427611 0.97937979 0.97427611 0.97932803 0.97417339 0.97938089 0.96904463 0.97427816] mean value: 0.9752689337126527 key: test_accuracy value: [0.60465116 0.62790698 0.69767442 0.74418605 0.76744186 0.51162791 0.65116279 0.62790698 0.55813953 0.69767442] mean value: 0.6488372093023256 key: train_accuracy value: [0.9870801 0.9870801 0.9870801 0.98966408 0.9870801 0.98966408 0.9870801 0.98966408 0.98449612 0.9870801 ] mean value: 0.9875968992248062 key: test_fscore value: [0.58536585 0.68 0.68292683 0.73170732 0.76190476 0.51162791 0.66666667 0.61904762 0.55813953 0.66666667] mean value: 0.6464053156146179 key: train_fscore value: [0.98694517 0.98694517 0.98694517 0.98958333 0.98694517 0.98969072 0.9870801 0.98963731 0.98445596 0.98701299] mean value: 0.9875241088454858 key: test_precision value: [0.63157895 0.60714286 0.73684211 0.78947368 0.8 0.5 0.625 0.61904762 0.54545455 0.72222222] mean value: 0.6576761980709349 key: train_precision value: [0.99473684 0.99473684 0.99473684 0.9947644 0.99473684 0.98969072 0.98963731 0.99479167 0.98958333 0.9947644 ] mean value: 0.9932179191581537 key: test_recall value: [0.54545455 0.77272727 0.63636364 0.68181818 0.72727273 0.52380952 0.71428571 0.61904762 0.57142857 0.61904762] mean value: 0.6411255411255411 key: train_recall value: [0.97927461 0.97927461 0.97927461 0.98445596 0.97927461 0.98969072 0.98453608 0.98453608 0.97938144 0.97938144] mean value: 0.9819080177340954 key: test_roc_auc value: [0.60606061 0.62445887 0.6991342 0.745671 0.76839827 0.51190476 0.6525974 0.62770563 0.55844156 0.69588745] mean value: 0.6490259740259741 key: train_roc_auc value: [0.98705999 0.98705999 0.98705999 0.98965066 0.98705999 0.98966401 0.98708669 0.98967737 0.98450937 0.98710005] mean value: 0.9875928102131296 key: test_jcc value: [0.4137931 0.51515152 0.51851852 0.57692308 0.61538462 0.34375 0.5 0.44827586 0.38709677 0.5 ] mean value: 0.4818893465688516 key: train_jcc value: [0.9742268 0.9742268 0.9742268 0.97938144 0.9742268 0.97959184 0.9744898 0.97948718 0.96938776 0.97435897] mean value: 0.975360420139507 MCC on Blind test: 0.27 Accuracy on Blind test: 0.64 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.65034485 0.64601135 0.64165354 0.64062381 0.63947058 0.6445632 0.64104247 0.64255095 0.63853455 0.63667583] mean value: 0.6421471118927002 key: score_time value: [0.00956774 0.00966597 0.00943089 0.009624 0.00955415 0.00924993 0.00980234 0.00949836 0.00945091 0.00953102] mean value: 0.009537529945373536 key: test_mcc value: [0.63123793 0.25490741 0.86929961 0.64040632 0.81701092 0.44155844 0.90692641 0.72451364 0.4912706 0.77418983] mean value: 0.6551321088445219 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.81395349 0.62790698 0.93023256 0.81395349 0.90697674 0.72093023 0.95348837 0.86046512 0.74418605 0.88372093] mean value: 0.8255813953488372 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.80952381 0.65217391 0.92682927 0.8 0.91304348 0.71428571 0.95238095 0.86363636 0.71794872 0.87179487] mean value: 0.822161708916746 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.85 0.625 1. 0.88888889 0.875 0.71428571 0.95238095 0.82608696 0.77777778 0.94444444] mean value: 0.8453864734299517 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.77272727 0.68181818 0.86363636 0.72727273 0.95454545 0.71428571 0.95238095 0.9047619 0.66666667 0.80952381] mean value: 0.8047619047619048 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.81493506 0.62662338 0.93181818 0.81601732 0.90584416 0.72077922 0.9534632 0.86147186 0.74242424 0.88203463] mean value: 0.8255411255411256 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.68 0.48387097 0.86363636 0.66666667 0.84 0.55555556 0.90909091 0.76 0.56 0.77272727] mean value: 0.7091547735418703 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.54 Accuracy on Blind test: 0.76 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.04376793 0.03075194 0.03130722 0.03120637 0.03110909 0.03160024 0.03136516 0.03184676 0.03131175 0.03312492] mean value: 0.03273913860321045 key: score_time value: [0.0127399 0.01823902 0.0177846 0.01779556 0.01547074 0.01559162 0.01546383 0.0193727 0.01602268 0.02370071] mean value: 0.017218136787414552 key: test_mcc value: [0.44227524 0.16762131 0.16154396 0.16726499 0.57247033 0.21351219 0.51986413 0.46619277 0.31757311 0.4517935 ] mean value: 0.3480111526255688 key: train_mcc value: [0.62455205 0.68487412 0.60464608 0.56891602 0.55700705 0.79136899 0.6761983 0.69660603 0.66807973 0.57186378] mean value: 0.6444112162684469 key: test_accuracy value: [0.72093023 0.55813953 0.58139535 0.58139535 0.76744186 0.60465116 0.74418605 0.69767442 0.62790698 0.72093023] mean value: 0.6604651162790698 key: train_accuracy value: [0.78036176 0.81912145 0.76744186 0.74418605 0.73643411 0.88630491 0.81395349 0.82687339 0.80878553 0.74677003] mean value: 0.7930232558139535 key: test_fscore value: [0.73913043 0.68852459 0.625 0.65384615 0.80769231 0.62222222 0.7755102 0.75471698 0.7037037 0.73913043] mean value: 0.7109477032407248 key: train_fscore value: [0.81953291 0.84649123 0.81092437 0.79587629 0.79098361 0.89767442 0.84347826 0.85274725 0.83982684 0.79835391] mean value: 0.8295889083253458 key: test_precision value: [0.70833333 0.53846154 0.57692308 0.56666667 0.7 0.58333333 0.67857143 0.625 0.57575758 0.68 ] mean value: 0.6233046953046953 key: train_precision value: [0.6942446 0.7338403 0.6819788 0.6609589 0.65423729 0.81779661 0.72932331 0.74329502 0.7238806 0.66438356] mean value: 0.7103938995586828 key: test_recall value: [0.77272727 0.95454545 0.68181818 0.77272727 0.95454545 0.66666667 0.9047619 0.95238095 0.9047619 0.80952381] mean value: 0.8374458874458874 key: train_recall value: [1. 1. 1. 1. 1. 0.99484536 1. 1. 1. 1. ] mean value: 0.9994845360824742 key: test_roc_auc value: [0.71969697 0.5487013 0.57900433 0.57683983 0.76298701 0.60606061 0.7478355 0.7034632 0.63419913 0.72294372] mean value: 0.6601731601731602 key: train_roc_auc value: [0.78092784 0.81958763 0.76804124 0.74484536 0.7371134 0.88602372 0.8134715 0.82642487 0.80829016 0.74611399] mean value: 0.7930839698734042 key: test_jcc value: [0.5862069 0.525 0.45454545 0.48571429 0.67741935 0.4516129 0.63333333 0.60606061 0.54285714 0.5862069 ] mean value: 0.5548956873678786 key: train_jcc value: [0.6942446 0.7338403 0.6819788 0.6609589 0.65423729 0.81434599 0.72932331 0.74329502 0.7238806 0.66438356] mean value: 0.7100488376978518 MCC on Blind test: 0.05 Accuracy on Blind test: 0.55 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02356768 0.03810334 0.03743887 0.04197311 0.06063938 0.04091549 0.03609419 0.04485106 0.06518602 0.05394864] mean value: 0.04427177906036377 key: score_time value: [0.02559161 0.03329444 0.0238359 0.02330065 0.01614666 0.02104402 0.0379591 0.01286244 0.02467752 0.04604602] mean value: 0.02647583484649658 key: test_mcc value: [0.4517935 0.2567 0.54609991 0.67532468 0.62964308 0.25490741 0.81701092 0.59970431 0.31423621 0.58557701] mean value: 0.513099699901425 key: train_mcc value: [0.77265565 0.77859243 0.75711768 0.74723387 0.77265565 0.79332817 0.76745366 0.74686824 0.77787869 0.74174506] mean value: 0.7655529075706143 key: test_accuracy value: [0.72093023 0.62790698 0.76744186 0.8372093 0.81395349 0.62790698 0.90697674 0.79069767 0.65116279 0.79069767] mean value: 0.7534883720930232 key: train_accuracy value: [0.88630491 0.88888889 0.87855297 0.87338501 0.88630491 0.89664083 0.88372093 0.87338501 0.88888889 0.87080103] mean value: 0.882687338501292 key: test_fscore value: [0.7 0.66666667 0.75 0.8372093 0.82608696 0.6 0.9 0.80851064 0.68085106 0.76923077] mean value: 0.7538555396872416 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_sl.py:136: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_sl.py:139: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.88659794 0.88594164 0.87855297 0.8707124 0.88659794 0.89637306 0.88372093 0.87272727 0.88831169 0.87244898] mean value: 0.8821984821340805 key: test_precision value: [0.77777778 0.61538462 0.83333333 0.85714286 0.79166667 0.63157895 0.94736842 0.73076923 0.61538462 0.83333333] mean value: 0.7633739798213482 key: train_precision value: [0.88205128 0.9076087 0.87628866 0.88709677 0.88205128 0.90104167 0.88601036 0.87958115 0.89528796 0.86363636] mean value: 0.8860654196687076 key: test_recall value: [0.63636364 0.72727273 0.68181818 0.81818182 0.86363636 0.57142857 0.85714286 0.9047619 0.76190476 0.71428571] mean value: 0.7536796536796537 key: train_recall value: [0.89119171 0.86528497 0.88082902 0.85492228 0.89119171 0.89175258 0.8814433 0.86597938 0.8814433 0.8814433 ] mean value: 0.8785481544789274 key: test_roc_auc value: [0.72294372 0.62554113 0.76948052 0.83766234 0.81277056 0.62662338 0.90584416 0.79329004 0.65367965 0.78896104] mean value: 0.7536796536796537 key: train_roc_auc value: [0.8863175 0.88882805 0.87855884 0.87333743 0.8863175 0.89665349 0.88372683 0.8734042 0.88890818 0.87077346] mean value: 0.8826825490091341 key: test_jcc value: [0.53846154 0.5 0.6 0.72 0.7037037 0.42857143 0.81818182 0.67857143 0.51612903 0.625 ] mean value: 0.6128618949747981 key: train_jcc value: [0.7962963 0.7952381 0.78341014 0.77102804 0.7962963 0.81220657 0.79166667 0.77419355 0.79906542 0.77375566] mean value: 0.7893156727955775 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.33694553 0.3823595 0.36175776 0.33384323 0.39316416 0.4307847 0.36658239 0.29315281 0.39274859 0.34651494] mean value: 0.3637853622436523 key: score_time value: [0.0161376 0.01644969 0.02295113 0.02281094 0.02321029 0.02241588 0.01558924 0.02218533 0.02343941 0.02282286] mean value: 0.020801234245300292 key: test_mcc value: [0.49456394 0.2567 0.54609991 0.72451364 0.67462198 0.25490741 0.81701092 0.68193178 0.12392414 0.58557701] mean value: 0.515985070255154 key: train_mcc value: [0.70549739 0.77859243 0.75711768 0.68994899 0.70542988 0.79332817 0.65891245 0.65375781 0.71577373 0.68992041] mean value: 0.7148278931967736 key: test_accuracy value: [0.74418605 0.62790698 0.76744186 0.86046512 0.8372093 0.62790698 0.90697674 0.8372093 0.55813953 0.79069767] mean value: 0.7558139534883721 key: train_accuracy value: [0.85271318 0.88888889 0.87855297 0.84496124 0.85271318 0.89664083 0.82945736 0.82687339 0.85788114 0.84496124] mean value: 0.8573643410852713 key: test_fscore value: [0.73170732 0.66666667 0.75 0.85714286 0.84444444 0.6 0.9 0.84444444 0.59574468 0.76923077] mean value: 0.7559381179853416 key: train_fscore value: [0.85117493 0.88594164 0.87855297 0.84375 0.85194805 0.89637306 0.82989691 0.82687339 0.85788114 0.84536082] mean value: 0.8567752913729867 key: test_precision value: [0.78947368 0.61538462 0.83333333 0.9 0.82608696 0.63157895 0.94736842 0.79166667 0.53846154 0.83333333] mean value: 0.7706687496332805 key: train_precision value: [0.85789474 0.9076087 0.87628866 0.84816754 0.85416667 0.90104167 0.82989691 0.82901554 0.86010363 0.84536082] mean value: 0.8609544867831661 key: test_recall value: [0.68181818 0.72727273 0.68181818 0.81818182 0.86363636 0.57142857 0.85714286 0.9047619 0.66666667 0.71428571] mean value: 0.7487012987012986 key: train_recall value: [0.84455959 0.86528497 0.88082902 0.83937824 0.84974093 0.89175258 0.82989691 0.82474227 0.8556701 0.84536082] mean value: 0.8527215426526361 key: test_roc_auc value: [0.745671 0.62554113 0.76948052 0.86147186 0.83658009 0.62662338 0.90584416 0.83874459 0.56060606 0.78896104] mean value: 0.7559523809523809 key: train_roc_auc value: [0.85269216 0.88882805 0.87855884 0.84494685 0.85270552 0.89665349 0.82945623 0.82687891 0.85788687 0.84496021] mean value: 0.8573567117141179 key: test_jcc value: [0.57692308 0.5 0.6 0.75 0.73076923 0.42857143 0.81818182 0.73076923 0.42424242 0.625 ] mean value: 0.6184457209457209 key: train_jcc value: [0.74090909 0.7952381 0.78341014 0.72972973 0.74208145 0.81220657 0.7092511 0.70484581 0.75113122 0.73214286] mean value: 0.7500946070021391 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.05380988 0.07341051 0.05587864 0.05535698 0.04124856 0.03699017 0.03655267 0.05492473 0.03699756 0.0496068 ] mean value: 0.049477648735046384 key: score_time value: [0.01278281 0.01413512 0.0141964 0.01429844 0.01432276 0.01433253 0.01438165 0.01441407 0.01473069 0.01502562] mean value: 0.014262008666992187 key: test_mcc value: [0.40088002 0.44227524 0.44155844 0.67532468 0.62770563 0.39479486 0.81385281 0.48026322 0.35748709 0.69166471] mean value: 0.5325806698493886 key: train_mcc value: [0.71576614 0.73651032 0.71062262 0.7002564 0.71063807 0.73651032 0.69510176 0.71062262 0.71059238 0.69515975] mean value: 0.7121780365755461 key: test_accuracy value: [0.69767442 0.72093023 0.72093023 0.8372093 0.81395349 0.69767442 0.90697674 0.72093023 0.6744186 0.8372093 ] mean value: 0.7627906976744185 key: train_accuracy value: [0.85788114 0.86821705 0.85529716 0.8501292 0.85529716 0.86821705 0.84754522 0.85529716 0.85529716 0.84754522] mean value: 0.8560723514211886 key: test_fscore value: [0.68292683 0.73913043 0.72727273 0.8372093 0.81818182 0.68292683 0.9047619 0.76 0.69565217 0.81081081] mean value: 0.765887283058508 key: train_fscore value: [0.85714286 0.86684073 0.85416667 0.84974093 0.8556701 0.86956522 0.84754522 0.85641026 0.8556701 0.84910486] mean value: 0.8561856946482916 key: test_precision value: [0.73684211 0.70833333 0.72727273 0.85714286 0.81818182 0.7 0.9047619 0.65517241 0.64 0.9375 ] mean value: 0.7685207159748902 key: train_precision value: [0.859375 0.87368421 0.85863874 0.84974093 0.85128205 0.86294416 0.84974093 0.85204082 0.8556701 0.84263959] mean value: 0.855575654631333 key: test_recall value: [0.63636364 0.77272727 0.72727273 0.81818182 0.81818182 0.66666667 0.9047619 0.9047619 0.76190476 0.71428571] mean value: 0.7725108225108225 key: train_recall value: [0.85492228 0.86010363 0.84974093 0.84974093 0.86010363 0.87628866 0.84536082 0.86082474 0.8556701 0.8556701 ] mean value: 0.8568425831953421 key: test_roc_auc value: [0.6991342 0.71969697 0.72077922 0.83766234 0.81385281 0.6969697 0.90692641 0.72510823 0.67640693 0.83441558] mean value: 0.763095238095238 key: train_roc_auc value: [0.85787351 0.86819614 0.85528284 0.8501282 0.85530955 0.86819614 0.84755088 0.85528284 0.85529619 0.84752417] mean value: 0.8560640457240531 key: test_jcc value: [0.51851852 0.5862069 0.57142857 0.72 0.69230769 0.51851852 0.82608696 0.61290323 0.53333333 0.68181818] mean value: 0.6261121894804731 key: train_jcc value: [0.75 0.76497696 0.74545455 0.73873874 0.74774775 0.76923077 0.73542601 0.74887892 0.74774775 0.73777778] mean value: 0.7485979217958099 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.96443534 1.14559245 0.9931128 1.13871431 1.37451053 1.55276895 1.93655849 1.40987992 1.65614939 2.07050133] mean value: 1.4242223501205444 key: score_time value: [0.01451755 0.019454 0.01595926 0.01519442 0.0151484 0.01562023 0.01477695 0.01309204 0.02022529 0.01571345] mean value: 0.01597015857696533 key: test_mcc value: [0.44468651 0.25490741 0.49456394 0.63123793 0.62770563 0.25490741 0.81385281 0.57954841 0.27084605 0.69166471] mean value: 0.5063920803567467 key: train_mcc value: [0.76744745 0.74163306 0.81435868 0.75711119 0.74703465 0.86568201 0.73643866 0.65380918 0.76230669 0.74163306] mean value: 0.7587454624519377 key: test_accuracy value: [0.72093023 0.62790698 0.74418605 0.81395349 0.81395349 0.62790698 0.90697674 0.76744186 0.62790698 0.8372093 ] mean value: 0.7488372093023256 key: train_accuracy value: [0.88372093 0.87080103 0.90697674 0.87855297 0.87338501 0.93281654 0.86821705 0.82687339 0.88113695 0.87080103] mean value: 0.879328165374677 key: test_fscore value: [0.71428571 0.65217391 0.73170732 0.80952381 0.81818182 0.6 0.9047619 0.8 0.66666667 0.81081081] mean value: 0.7508111954347373 key: train_fscore value: [0.88311688 0.86979167 0.90816327 0.87792208 0.87468031 0.93264249 0.8688946 0.8286445 0.88205128 0.87179487] mean value: 0.8797701943631095 key: test_precision value: [0.75 0.625 0.78947368 0.85 0.81818182 0.63157895 0.9047619 0.68965517 0.59259259 0.9375 ] mean value: 0.7588744119529056 key: train_precision value: [0.88541667 0.87434555 0.89447236 0.88020833 0.86363636 0.9375 0.86666667 0.82233503 0.87755102 0.86734694] mean value: 0.876947892641468 key: test_recall value: [0.68181818 0.68181818 0.68181818 0.77272727 0.81818182 0.57142857 0.9047619 0.95238095 0.76190476 0.71428571] mean value: 0.7541125541125541 key: train_recall value: [0.88082902 0.86528497 0.92227979 0.87564767 0.88601036 0.92783505 0.87113402 0.83505155 0.88659794 0.87628866] mean value: 0.8826959029966348 key: test_roc_auc value: [0.72186147 0.62662338 0.745671 0.81493506 0.81385281 0.62662338 0.90692641 0.77164502 0.63095238 0.83441558] mean value: 0.7493506493506493 key: train_roc_auc value: [0.88371348 0.87078682 0.90701619 0.87854548 0.87341755 0.93282944 0.8682095 0.8268522 0.8811228 0.87078682] mean value: 0.8793280273489664 key: test_jcc value: [0.55555556 0.48387097 0.57692308 0.68 0.69230769 0.42857143 0.82608696 0.66666667 0.5 0.68181818] mean value: 0.6091800526106277 key: train_jcc value: [0.79069767 0.76958525 0.8317757 0.78240741 0.77727273 0.87378641 0.76818182 0.70742358 0.78899083 0.77272727] mean value: 0.786284866863972 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.02845478 0.01532006 0.01588511 0.01669621 0.01684451 0.01621389 0.01678681 0.01132989 0.01192236 0.01090431] mean value: 0.016035795211791992 key: score_time value: [0.02339506 0.01329732 0.01424932 0.01453495 0.01454854 0.01461339 0.01037192 0.01000857 0.0116744 0.00963879] mean value: 0.013633227348327637 key: test_mcc value: [ 0.44701207 -0.03967598 0.39696419 0.34859132 0.36986766 0.3071961 0.67883359 0.17877574 0.42224772 0.11982827] mean value: 0.32296406808134626 key: train_mcc value: [0.37165326 0.40315208 0.39862039 0.38778838 0.37968295 0.41896574 0.37598756 0.43916108 0.41539874 0.46520935] mean value: 0.40556195340462464 key: test_accuracy value: [0.72093023 0.48837209 0.69767442 0.6744186 0.6744186 0.65116279 0.81395349 0.55813953 0.69767442 0.55813953] mean value: 0.6534883720930232 key: train_accuracy value: [0.6744186 0.69509044 0.68992248 0.67183463 0.67958656 0.70284238 0.67958656 0.71317829 0.7002584 0.72609819] mean value: 0.6932816537467701 key: test_fscore value: [0.75 0.59259259 0.72340426 0.69565217 0.73076923 0.66666667 0.84 0.66666667 0.73469388 0.57777778] mean value: 0.6978223241256147 key: train_fscore value: [0.72123894 0.7281106 0.72972973 0.73263158 0.72321429 0.73684211 0.72197309 0.74482759 0.73636364 0.75576037] mean value: 0.7330691922190511 key: test_precision value: [0.69230769 0.5 0.68 0.66666667 0.63333333 0.625 0.72413793 0.52777778 0.64285714 0.54166667] mean value: 0.6233747210643762 key: train_precision value: [0.62934363 0.65560166 0.64541833 0.61702128 0.63529412 0.66255144 0.63888889 0.67219917 0.65853659 0.68333333] mean value: 0.6498188428072472 key: test_recall value: [0.81818182 0.72727273 0.77272727 0.72727273 0.86363636 0.71428571 1. 0.9047619 0.85714286 0.61904762] mean value: 0.8004329004329005 key: train_recall value: [0.84455959 0.81865285 0.83937824 0.9015544 0.83937824 0.82989691 0.82989691 0.83505155 0.83505155 0.84536082] mean value: 0.8418781048020939 key: test_roc_auc value: [0.71861472 0.48268398 0.69588745 0.67316017 0.66991342 0.6525974 0.81818182 0.56601732 0.7012987 0.55952381] mean value: 0.6537878787878788 key: train_roc_auc value: [0.67485711 0.6954089 0.69030768 0.67242669 0.6799984 0.70251322 0.67919716 0.71286256 0.69990919 0.72578922] mean value: 0.6933270124459163 key: test_jcc value: [0.6 0.42105263 0.56666667 0.53333333 0.57575758 0.5 0.72413793 0.5 0.58064516 0.40625 ] mean value: 0.5407843299661328 key: train_jcc value: [0.56401384 0.57246377 0.57446809 0.57807309 0.56643357 0.58333333 0.56491228 0.59340659 0.58273381 0.60740741] mean value: 0.5787245777986066 MCC on Blind test: 0.32 Accuracy on Blind test: 0.67 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01109862 0.01014376 0.01011562 0.0104208 0.01032352 0.01022625 0.0103817 0.01012492 0.01034141 0.01021791] mean value: 0.010339450836181641 key: score_time value: [0.00949383 0.00930595 0.00907779 0.00954795 0.00929666 0.00967145 0.00931907 0.00926375 0.00916362 0.00910592] mean value: 0.009324598312377929 key: test_mcc value: [0.48917749 0.06753957 0.20995671 0.50454827 0.39696419 0.3030303 0.72451364 0.36709713 0.36709713 0.34859132] mean value: 0.3778515749103558 key: train_mcc value: [0.46826734 0.51459683 0.46884804 0.49038014 0.49958596 0.51988165 0.4373134 0.47321307 0.45779106 0.48339175] mean value: 0.48132692433520785 key: test_accuracy value: [0.74418605 0.53488372 0.60465116 0.74418605 0.69767442 0.65116279 0.86046512 0.6744186 0.6744186 0.6744186 ] mean value: 0.686046511627907 key: train_accuracy value: [0.73385013 0.75710594 0.73385013 0.74418605 0.74935401 0.75968992 0.71834625 0.73643411 0.72868217 0.74160207] mean value: 0.7403100775193798 key: test_fscore value: [0.74418605 0.56521739 0.60465116 0.71794872 0.72340426 0.65116279 0.86363636 0.70833333 0.70833333 0.65 ] mean value: 0.6936873394875245 key: train_fscore value: [0.73924051 0.75132275 0.74185464 0.75434243 0.75566751 0.76574307 0.72681704 0.74242424 0.73551637 0.74619289] mean value: 0.7459121456577962 key: test_precision value: [0.76190476 0.54166667 0.61904762 0.82352941 0.68 0.63636364 0.82608696 0.62962963 0.62962963 0.68421053] mean value: 0.6832068837844177 key: train_precision value: [0.72277228 0.76756757 0.7184466 0.72380952 0.73529412 0.74876847 0.70731707 0.72772277 0.71921182 0.735 ] mean value: 0.7305910229208082 key: test_recall value: [0.72727273 0.59090909 0.59090909 0.63636364 0.77272727 0.66666667 0.9047619 0.80952381 0.80952381 0.61904762] mean value: 0.7127705627705627 key: train_recall value: [0.75647668 0.7357513 0.76683938 0.78756477 0.77720207 0.78350515 0.74742268 0.75773196 0.75257732 0.75773196] mean value: 0.762280326905614 key: test_roc_auc value: [0.74458874 0.53354978 0.60497835 0.74675325 0.69588745 0.65151515 0.86147186 0.67748918 0.67748918 0.67316017] mean value: 0.6866883116883117 key: train_roc_auc value: [0.73390845 0.75705091 0.73393515 0.74429785 0.74942578 0.75962822 0.71827093 0.73637893 0.72862027 0.74156028] mean value: 0.7403076758720154 key: test_jcc value: [0.59259259 0.39393939 0.43333333 0.56 0.56666667 0.48275862 0.76 0.5483871 0.5483871 0.48148148] mean value: 0.536754628225151 key: train_jcc value: [0.58634538 0.60169492 0.58964143 0.60557769 0.60728745 0.62040816 0.57086614 0.59036145 0.58167331 0.5951417 ] mean value: 0.5948997627637519 MCC on Blind test: 0.48 Accuracy on Blind test: 0.73 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00967765 0.01076436 0.0116446 0.01169658 0.0106988 0.01059484 0.0109911 0.01088476 0.01136351 0.01097918] mean value: 0.010929536819458009 key: score_time value: [0.01712894 0.0148201 0.01408362 0.01479387 0.01760268 0.01796579 0.01856637 0.0191288 0.01793575 0.01807547] mean value: 0.017010140419006347 key: test_mcc value: [ 0.21040933 0.06926407 0.2581351 0.48917749 0.49456394 -0.01790718 0.06926407 0.20995671 0.3071961 0.20835137] mean value: 0.22984110024173177 key: train_mcc value: [0.52028836 0.49354951 0.49874453 0.49401307 0.50391282 0.56729474 0.48475098 0.53489677 0.51938999 0.5093282 ] mean value: 0.51261689832722 key: test_accuracy value: [0.60465116 0.53488372 0.62790698 0.74418605 0.74418605 0.48837209 0.53488372 0.60465116 0.65116279 0.60465116] mean value: 0.613953488372093 key: train_accuracy value: [0.75968992 0.74677003 0.74935401 0.74677003 0.75193798 0.78294574 0.74160207 0.76744186 0.75968992 0.75452196] mean value: 0.7560723514211887 key: test_fscore value: [0.65306122 0.54545455 0.61904762 0.74418605 0.73170732 0.54166667 0.52380952 0.60465116 0.66666667 0.56410256] mean value: 0.6194353336612878 key: train_fscore value: [0.76574307 0.74479167 0.74673629 0.75126904 0.75257732 0.79104478 0.75247525 0.76923077 0.75968992 0.75949367] mean value: 0.7593051773504969 key: test_precision value: [0.59259259 0.54545455 0.65 0.76190476 0.78947368 0.48148148 0.52380952 0.59090909 0.625 0.61111111] mean value: 0.6171736791473633 key: train_precision value: [0.74509804 0.7486911 0.75263158 0.73631841 0.74871795 0.76442308 0.72380952 0.76530612 0.76165803 0.74626866] mean value: 0.7492922485303724 key: test_recall value: [0.72727273 0.54545455 0.59090909 0.72727273 0.68181818 0.61904762 0.52380952 0.61904762 0.71428571 0.52380952] mean value: 0.6272727272727273 key: train_recall value: [0.78756477 0.74093264 0.74093264 0.76683938 0.75647668 0.81958763 0.78350515 0.77319588 0.75773196 0.77319588] mean value: 0.7699962608834998 key: test_roc_auc value: [0.6017316 0.53463203 0.62878788 0.74458874 0.745671 0.49134199 0.53463203 0.60497835 0.6525974 0.60281385] mean value: 0.6141774891774892 key: train_roc_auc value: [0.75976176 0.74675498 0.7493323 0.74682175 0.75194968 0.78285081 0.74149351 0.76742695 0.75969499 0.75447359] mean value: 0.7560560333315528 key: test_jcc value: [0.48484848 0.375 0.44827586 0.59259259 0.57692308 0.37142857 0.35483871 0.43333333 0.5 0.39285714] mean value: 0.45300977737295867 key: train_jcc value: [0.62040816 0.593361 0.59583333 0.60162602 0.60330579 0.65432099 0.6031746 0.625 0.6125 0.6122449 ] mean value: 0.61217747826215 MCC on Blind test: 0.15 Accuracy on Blind test: 0.58 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02884865 0.03000712 0.03000569 0.0280571 0.0189383 0.02147675 0.01978326 0.01903105 0.02211714 0.01889801] mean value: 0.023716306686401366 key: score_time value: [0.01581597 0.01788545 0.0181725 0.01611257 0.01134253 0.0113802 0.01225686 0.0116148 0.01208067 0.01143336] mean value: 0.013809490203857421 key: test_mcc value: [0.39479486 0.44701207 0.25541126 0.68193178 0.58134627 0.3071961 0.68193178 0.50454827 0.4517935 0.5421681 ] mean value: 0.484813397777736 key: train_mcc value: [0.71577373 0.706524 0.71075971 0.68996555 0.69518417 0.7159805 0.69120159 0.70564037 0.73129624 0.69518417] mean value: 0.705751002787364 key: test_accuracy value: [0.69767442 0.72093023 0.62790698 0.8372093 0.79069767 0.65116279 0.8372093 0.74418605 0.72093023 0.76744186] mean value: 0.7395348837209302 key: train_accuracy value: [0.85788114 0.85271318 0.85529716 0.84496124 0.84754522 0.85788114 0.84496124 0.85271318 0.86563307 0.84754522] mean value: 0.8527131782945736 key: test_fscore value: [0.71111111 0.75 0.63636364 0.82926829 0.8 0.66666667 0.84444444 0.76595745 0.73913043 0.73684211] mean value: 0.7479784138123062 key: train_fscore value: [0.85788114 0.848 0.85641026 0.84536082 0.84832905 0.86005089 0.85 0.85496183 0.86666667 0.84675325] mean value: 0.8534413903012841 key: test_precision value: [0.69565217 0.69230769 0.63636364 0.89473684 0.7826087 0.625 0.79166667 0.69230769 0.68 0.82352941] mean value: 0.7314172811080875 key: train_precision value: [0.8556701 0.87362637 0.84771574 0.84102564 0.84183673 0.84924623 0.82524272 0.84422111 0.8622449 0.85340314] mean value: 0.8494232682929744 key: test_recall value: [0.72727273 0.81818182 0.63636364 0.77272727 0.81818182 0.71428571 0.9047619 0.85714286 0.80952381 0.66666667] mean value: 0.7725108225108225 key: train_recall value: [0.86010363 0.8238342 0.86528497 0.84974093 0.85492228 0.87113402 0.87628866 0.86597938 0.87113402 0.84020619] mean value: 0.8578628278403931 key: test_roc_auc value: [0.6969697 0.71861472 0.62770563 0.83874459 0.79004329 0.6525974 0.83874459 0.74675325 0.72294372 0.76515152] mean value: 0.7398268398268398 key: train_roc_auc value: [0.85788687 0.85263875 0.8553229 0.84497356 0.84756423 0.8578468 0.84488008 0.85267881 0.86561882 0.84756423] mean value: 0.8526975054751349 key: test_jcc value: [0.55172414 0.6 0.46666667 0.70833333 0.66666667 0.5 0.73076923 0.62068966 0.5862069 0.58333333] mean value: 0.6014389920424403 key: train_jcc value: [0.75113122 0.73611111 0.74887892 0.73214286 0.73660714 0.75446429 0.73913043 0.74666667 0.76470588 0.73423423] mean value: 0.744407276034812 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [2.80615783 2.84482694 3.85962248 3.31184196 2.43806434 3.70860004 4.42152381 3.72705698 4.61319971 3.57754469] mean value: 3.5308438777923583 key: score_time value: [0.01527119 0.0216074 0.01274061 0.03064179 0.015028 0.0196197 0.01770926 0.01731467 0.02244329 0.05502224] mean value: 0.022739815711975097 key: test_mcc value: [0.44468651 0.44227524 0.40088002 0.58134627 0.62770563 0.2581351 0.76789769 0.54609991 0.30151915 0.65153277] mean value: 0.5022078281156532 key: train_mcc value: [0.96904298 0.96393847 0.96383644 0.96899204 0.95870837 0.97932803 0.96383644 0.96899204 0.97417339 0.95865605] mean value: 0.9669504240144378 key: test_accuracy value: [0.72093023 0.72093023 0.69767442 0.79069767 0.81395349 0.62790698 0.88372093 0.76744186 0.65116279 0.81395349] mean value: 0.7488372093023256 key: train_accuracy value: [0.98449612 0.98191214 0.98191214 0.98449612 0.97932817 0.98966408 0.98191214 0.98449612 0.9870801 0.97932817] mean value: 0.9834625322997416 key: test_fscore value: [0.71428571 0.73913043 0.68292683 0.8 0.81818182 0.63636364 0.87804878 0.7826087 0.63414634 0.77777778] mean value: 0.7463470028263242 key: train_fscore value: [0.984375 0.98172324 0.98181818 0.98445596 0.97938144 0.98969072 0.98200514 0.98453608 0.9870801 0.97938144] mean value: 0.9834447313434314 key: test_precision value: [0.75 0.70833333 0.73684211 0.7826087 0.81818182 0.60869565 0.9 0.72 0.65 0.93333333] mean value: 0.760799493793773 key: train_precision value: [0.9895288 0.98947368 0.984375 0.98445596 0.97435897 0.98969072 0.97948718 0.98453608 0.98963731 0.97938144] mean value: 0.9844925145539584 key: test_recall value: [0.68181818 0.77272727 0.63636364 0.81818182 0.81818182 0.66666667 0.85714286 0.85714286 0.61904762 0.66666667] mean value: 0.7393939393939394 key: train_recall value: [0.97927461 0.97409326 0.97927461 0.98445596 0.98445596 0.98969072 0.98453608 0.98453608 0.98453608 0.97938144] mean value: 0.9824234816516212 key: test_roc_auc value: [0.72186147 0.71969697 0.6991342 0.79004329 0.81385281 0.62878788 0.88311688 0.76948052 0.6504329 0.81060606] mean value: 0.7487012987012986 key: train_roc_auc value: [0.98448267 0.98189199 0.98190535 0.98449602 0.97934138 0.98966401 0.98190535 0.98449602 0.98708669 0.97932803] mean value: 0.983459751081673 key: test_jcc value: [0.55555556 0.5862069 0.51851852 0.66666667 0.69230769 0.46666667 0.7826087 0.64285714 0.46428571 0.63636364] mean value: 0.6012037185425492 key: train_jcc value: [0.96923077 0.96410256 0.96428571 0.96938776 0.95959596 0.97959184 0.96464646 0.96954315 0.9744898 0.95959596] mean value: 0.9674469966420656 MCC on Blind test: 0.31 Accuracy on Blind test: 0.64 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.04940581 0.03220487 0.02999425 0.03011823 0.02668381 0.03157115 0.03161311 0.03097177 0.03061152 0.03121257] mean value: 0.03243870735168457 key: score_time value: [0.01049972 0.01050186 0.01050282 0.01048779 0.01052856 0.01086879 0.0111475 0.01051283 0.01061201 0.01057076] mean value: 0.01062326431274414 key: test_mcc value: [0.49456394 0.30151915 0.35141081 0.53463203 0.91106505 0.36709713 0.86117339 0.67532468 0.63123793 0.55391636] mean value: 0.5681940463108934 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.74418605 0.65116279 0.6744186 0.76744186 0.95348837 0.6744186 0.93023256 0.8372093 0.81395349 0.76744186] mean value: 0.7813953488372093 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.73170732 0.66666667 0.66666667 0.77272727 0.95238095 0.70833333 0.92682927 0.8372093 0.81818182 0.72222222] mean value: 0.7802924819870367 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.78947368 0.65217391 0.7 0.77272727 1. 0.62962963 0.95 0.81818182 0.7826087 0.86666667] mean value: 0.7961461680111566 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.68181818 0.68181818 0.63636364 0.77272727 0.90909091 0.80952381 0.9047619 0.85714286 0.85714286 0.61904762] mean value: 0.7729437229437229 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.745671 0.6504329 0.67532468 0.76731602 0.95454545 0.67748918 0.92965368 0.83766234 0.81493506 0.76406926] mean value: 0.7817099567099567 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.57692308 0.5 0.5 0.62962963 0.90909091 0.5483871 0.86363636 0.72 0.69230769 0.56521739] mean value: 0.6505192159666213 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.47 Accuracy on Blind test: 0.73 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.1436162 0.14201403 0.1432929 0.14335656 0.14548445 0.14574623 0.14563632 0.14425778 0.13957262 0.15334868] mean value: 0.14463257789611816 key: score_time value: [0.02052784 0.02075601 0.02071357 0.02085137 0.02147961 0.02125859 0.02084136 0.02108717 0.02033615 0.02889132] mean value: 0.021674299240112306 key: test_mcc value: [0.53796222 0.35185603 0.3961039 0.67532468 0.62770563 0.3071961 0.58824786 0.65585036 0.39479486 0.49916256] mean value: 0.5034204185252775 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.76744186 0.6744186 0.69767442 0.8372093 0.81395349 0.65116279 0.79069767 0.81395349 0.69767442 0.74418605] mean value: 0.7488372093023256 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.76190476 0.70833333 0.69767442 0.8372093 0.81818182 0.66666667 0.8 0.83333333 0.68292683 0.7027027 ] mean value: 0.7508933166321141 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8 0.65384615 0.71428571 0.85714286 0.81818182 0.625 0.75 0.74074074 0.7 0.8125 ] mean value: 0.7471697284197284 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 0.77272727 0.68181818 0.81818182 0.81818182 0.71428571 0.85714286 0.95238095 0.66666667 0.61904762] mean value: 0.7627705627705628 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.76839827 0.67207792 0.69805195 0.83766234 0.81385281 0.6525974 0.79220779 0.81709957 0.6969697 0.74134199] mean value: 0.749025974025974 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.61538462 0.5483871 0.53571429 0.72 0.69230769 0.5 0.66666667 0.71428571 0.51851852 0.54166667] mean value: 0.6052931256318352 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01237321 0.01223278 0.01213527 0.01217461 0.01231837 0.01222086 0.01203585 0.01226687 0.01231027 0.01201582] mean value: 0.012208390235900878 key: score_time value: [0.01035666 0.01029444 0.01033831 0.01035571 0.01038551 0.01044679 0.01041675 0.01035023 0.01033854 0.01028085] mean value: 0.010356378555297852 key: test_mcc value: [0.49456394 0.20824344 0.49456394 0.16485939 0.44227524 0.39696419 0.34848485 0.35141081 0.30666041 0.44155844] mean value: 0.36495846433707946 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.74418605 0.60465116 0.74418605 0.58139535 0.72093023 0.69767442 0.6744186 0.6744186 0.65116279 0.72093023] mean value: 0.6813953488372093 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.73170732 0.62222222 0.73170732 0.57142857 0.73913043 0.66666667 0.66666667 0.68181818 0.59459459 0.71428571] mean value: 0.6720227686611568 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.78947368 0.60869565 0.78947368 0.6 0.70833333 0.72222222 0.66666667 0.65217391 0.6875 0.71428571] mean value: 0.6938824870146381 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.68181818 0.63636364 0.68181818 0.54545455 0.77272727 0.61904762 0.66666667 0.71428571 0.52380952 0.71428571] mean value: 0.6556277056277056 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.745671 0.6038961 0.745671 0.58225108 0.71969697 0.69588745 0.67424242 0.67532468 0.6482684 0.72077922] mean value: 0.6811688311688312 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.57692308 0.4516129 0.57692308 0.4 0.5862069 0.5 0.5 0.51724138 0.42307692 0.55555556] mean value: 0.5087539811566508 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.57 Accuracy on Blind test: 0.79 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [2.01366973 1.98927975 2.00896358 1.99248004 2.00078726 1.97842026 2.0029223 2.10296893 2.33100224 1.73028731] mean value: 2.015078139305115 key: score_time value: [0.10578775 0.10516572 0.10582972 0.10600066 0.10558033 0.10556436 0.10616589 0.12238479 0.09266567 0.10008526] mean value: 0.10552301406860351 key: test_mcc value: [0.54609991 0.55391636 0.67462198 0.55959928 0.76839827 0.30151915 0.72077922 0.73471273 0.53463203 0.59541363] mean value: 0.5989692558350634 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.76744186 0.76744186 0.8372093 0.76744186 0.88372093 0.65116279 0.86046512 0.86046512 0.76744186 0.79069767] mean value: 0.7953488372093024 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.75 0.8 0.84444444 0.73684211 0.88372093 0.63414634 0.85714286 0.86956522 0.76190476 0.75675676] mean value: 0.7894523414599255 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.83333333 0.71428571 0.82608696 0.875 0.9047619 0.65 0.85714286 0.8 0.76190476 0.875 ] mean value: 0.8097515527950311 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.68181818 0.90909091 0.86363636 0.63636364 0.86363636 0.61904762 0.85714286 0.95238095 0.76190476 0.66666667] mean value: 0.7811688311688312 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.76948052 0.76406926 0.83658009 0.77056277 0.88419913 0.6504329 0.86038961 0.86255411 0.76731602 0.78787879] mean value: 0.7953463203463204 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.6 0.66666667 0.73076923 0.58333333 0.79166667 0.46428571 0.75 0.76923077 0.61538462 0.60869565] mean value: 0.658003264851091 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.57 Accuracy on Blind test: 0.79 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.04522681 2.64263201 2.22396159 0.96589065 0.95259833 0.92633867 0.92470813 0.97479391 0.94780827 0.96445656] mean value: 1.256841492652893 key: score_time value: [0.19241905 0.23578048 0.18349552 0.27233052 0.15954733 0.20956826 0.12825727 0.14750695 0.12867522 0.26806164] mean value: 0.19256422519683838 key: test_mcc value: [0.58824786 0.49916256 0.67532468 0.61748053 0.81385281 0.39696419 0.81385281 0.65585036 0.62964308 0.63732414] mean value: 0.6327703013890973 key: train_mcc value: [0.90697612 0.86563218 0.89158365 0.88635453 0.88143837 0.88123732 0.8914826 0.88630415 0.90702706 0.88630415] mean value: 0.8884340130519578 key: test_accuracy value: [0.79069767 0.74418605 0.8372093 0.79069767 0.90697674 0.69767442 0.90697674 0.81395349 0.81395349 0.81395349] mean value: 0.8116279069767441 key: train_accuracy value: [0.95348837 0.93281654 0.94573643 0.94315245 0.94056848 0.94056848 0.94573643 0.94315245 0.95348837 0.94315245] mean value: 0.944186046511628 key: test_fscore value: [0.7804878 0.7755102 0.8372093 0.75675676 0.90909091 0.66666667 0.9047619 0.83333333 0.8 0.78947368] mean value: 0.805329056610536 key: train_fscore value: [0.95336788 0.93264249 0.94601542 0.94329897 0.94117647 0.94117647 0.94601542 0.94329897 0.95336788 0.94329897] mean value: 0.9443658935063983 key: test_precision value: [0.84210526 0.7037037 0.85714286 0.93333333 0.90909091 0.72222222 0.9047619 0.74074074 0.84210526 0.88235294] mean value: 0.8337559138487931 key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( train_precision value: [0.95336788 0.93264249 0.93877551 0.93846154 0.92929293 0.93401015 0.94358974 0.94329897 0.95833333 0.94329897] mean value: 0.9415071508004521 key: test_recall value: [0.72727273 0.86363636 0.81818182 0.63636364 0.90909091 0.61904762 0.9047619 0.95238095 0.76190476 0.71428571] mean value: 0.7906926406926407 key: train_recall value: [0.95336788 0.93264249 0.95336788 0.94818653 0.95336788 0.94845361 0.94845361 0.94329897 0.94845361 0.94329897] mean value: 0.9472891405373645 key: test_roc_auc value: [0.79220779 0.74134199 0.83766234 0.79437229 0.90692641 0.69588745 0.90692641 0.81709957 0.81277056 0.81168831] mean value: 0.8116883116883117 key: train_roc_auc value: [0.95348806 0.93281609 0.9457561 0.94316543 0.94060146 0.94054805 0.94572939 0.94315208 0.95350142 0.94315208] mean value: 0.9441910154372095 key: test_jcc value: [0.64 0.63333333 0.72 0.60869565 0.83333333 0.5 0.82608696 0.71428571 0.66666667 0.65217391] mean value: 0.6794575569358178 key: train_jcc value: [0.91089109 0.87378641 0.89756098 0.89268293 0.88888889 0.88888889 0.89756098 0.89268293 0.91089109 0.89268293] mean value: 0.8946517095469907 MCC on Blind test: 0.57 Accuracy on Blind test: 0.79 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02952361 0.01485586 0.02655649 0.01511598 0.01541734 0.01484561 0.0150702 0.01474094 0.0148046 0.01679945] mean value: 0.017773008346557616 key: score_time value: [0.02834201 0.01304078 0.01354599 0.0139358 0.01270437 0.01640105 0.01323628 0.01268053 0.01396489 0.01378512] mean value: 0.015163683891296386 key: test_mcc value: [0.48917749 0.06753957 0.20995671 0.50454827 0.39696419 0.3030303 0.72451364 0.36709713 0.36709713 0.34859132] mean value: 0.3778515749103558 key: train_mcc value: [0.46826734 0.51459683 0.46884804 0.49038014 0.49958596 0.51988165 0.4373134 0.47321307 0.45779106 0.48339175] mean value: 0.48132692433520785 key: test_accuracy value: [0.74418605 0.53488372 0.60465116 0.74418605 0.69767442 0.65116279 0.86046512 0.6744186 0.6744186 0.6744186 ] mean value: 0.686046511627907 key: train_accuracy value: [0.73385013 0.75710594 0.73385013 0.74418605 0.74935401 0.75968992 0.71834625 0.73643411 0.72868217 0.74160207] mean value: 0.7403100775193798 key: test_fscore value: [0.74418605 0.56521739 0.60465116 0.71794872 0.72340426 0.65116279 0.86363636 0.70833333 0.70833333 0.65 ] mean value: 0.6936873394875245 key: train_fscore value: [0.73924051 0.75132275 0.74185464 0.75434243 0.75566751 0.76574307 0.72681704 0.74242424 0.73551637 0.74619289] mean value: 0.7459121456577962 key: test_precision value: [0.76190476 0.54166667 0.61904762 0.82352941 0.68 0.63636364 0.82608696 0.62962963 0.62962963 0.68421053] mean value: 0.6832068837844177 key: train_precision value: [0.72277228 0.76756757 0.7184466 0.72380952 0.73529412 0.74876847 0.70731707 0.72772277 0.71921182 0.735 ] mean value: 0.7305910229208082 key: test_recall value: [0.72727273 0.59090909 0.59090909 0.63636364 0.77272727 0.66666667 0.9047619 0.80952381 0.80952381 0.61904762] mean value: 0.7127705627705627 key: train_recall value: [0.75647668 0.7357513 0.76683938 0.78756477 0.77720207 0.78350515 0.74742268 0.75773196 0.75257732 0.75773196] mean value: 0.762280326905614 key: test_roc_auc value: [0.74458874 0.53354978 0.60497835 0.74675325 0.69588745 0.65151515 0.86147186 0.67748918 0.67748918 0.67316017] mean value: 0.6866883116883117 key: train_roc_auc value: [0.73390845 0.75705091 0.73393515 0.74429785 0.74942578 0.75962822 0.71827093 0.73637893 0.72862027 0.74156028] mean value: 0.7403076758720154 key: test_jcc value: [0.59259259 0.39393939 0.43333333 0.56 0.56666667 0.48275862 0.76 0.5483871 0.5483871 0.48148148] mean value: 0.536754628225151 key: train_jcc value: [0.58634538 0.60169492 0.58964143 0.60557769 0.60728745 0.62040816 0.57086614 0.59036145 0.58167331 0.5951417 ] mean value: 0.5948997627637519 MCC on Blind test: 0.48 Accuracy on Blind test: 0.73 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [9.16078043 7.99839926 6.91333866 8.29607415 7.58446097 6.60297179 2.58368206 3.86609006 6.45868564 7.47924662] mean value: 6.694372963905335 key: score_time value: [0.04491019 0.0193634 0.02580667 0.02789092 0.0248673 0.03136349 0.01814866 0.02837729 0.02448654 0.02403283] mean value: 0.026924729347229004 key: test_mcc value: [0.63123793 0.58134627 0.68193178 0.58824786 1. 0.53463203 0.9544491 0.72451364 0.62964308 0.77418983] mean value: 0.7100191509900233 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.81395349 0.79069767 0.8372093 0.79069767 1. 0.76744186 0.97674419 0.86046512 0.81395349 0.88372093] mean value: 0.8534883720930233 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.80952381 0.8 0.82926829 0.7804878 1. 0.76190476 0.97560976 0.86363636 0.8 0.87179487] mean value: 0.8492225660518343 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.85 0.7826087 0.89473684 0.84210526 1. 0.76190476 1. 0.82608696 0.84210526 0.94444444] mean value: 0.8743992226944172 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.77272727 0.81818182 0.77272727 0.72727273 1. 0.76190476 0.95238095 0.9047619 0.76190476 0.80952381] mean value: 0.8281385281385282 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.81493506 0.79004329 0.83874459 0.79220779 1. 0.76731602 0.97619048 0.86147186 0.81277056 0.88203463] mean value: 0.8535714285714285 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.68 0.66666667 0.70833333 0.64 1. 0.61538462 0.95238095 0.76 0.66666667 0.77272727] mean value: 0.7462159507159507 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.64 Accuracy on Blind test: 0.82 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.09778094 0.12179923 0.07625556 0.07825637 0.16504908 0.09413481 0.17084694 0.09448075 0.09310031 0.12652278] mean value: 0.11182267665863037 key: score_time value: [0.03333974 0.05212307 0.01235008 0.02842474 0.04834318 0.03797722 0.03660226 0.02439833 0.02313018 0.03546667] mean value: 0.03321554660797119 key: test_mcc value: [0.44155844 0.16122349 0.21351219 0.72077922 0.53595916 0.16233766 0.67532468 0.50454827 0.34859132 0.4912706 ] mean value: 0.4255105029586544 key: train_mcc value: [0.78812563 0.80377755 0.81399076 0.77859243 0.75711768 0.86605933 0.81967357 0.76242255 0.81913359 0.77778965] mean value: 0.7986682745861051 key: test_accuracy value: [0.72093023 0.58139535 0.60465116 0.86046512 0.76744186 0.58139535 0.8372093 0.74418605 0.6744186 0.74418605] mean value: 0.7116279069767442 key: train_accuracy value: [0.89405685 0.90180879 0.90697674 0.88888889 0.87855297 0.93281654 0.90956072 0.88113695 0.90956072 0.88888889] mean value: 0.8992248062015504 key: test_fscore value: [0.72727273 0.60869565 0.58536585 0.86363636 0.7826087 0.57142857 0.8372093 0.76595745 0.65 0.71794872] mean value: 0.7110123330905096 key: train_fscore value: [0.89405685 0.90052356 0.90625 0.88594164 0.87855297 0.93193717 0.90813648 0.88265306 0.90956072 0.88888889] mean value: 0.8986501353235298 key: test_precision value: [0.72727273 0.58333333 0.63157895 0.86363636 0.75 0.57142857 0.81818182 0.69230769 0.68421053 0.77777778] mean value: 0.7099727757622495 key: train_precision value: [0.89175258 0.91005291 0.91099476 0.9076087 0.87628866 0.94680851 0.92513369 0.87373737 0.9119171 0.89119171] mean value: 0.9045485989721791 key: test_recall value: [0.72727273 0.63636364 0.54545455 0.86363636 0.81818182 0.57142857 0.85714286 0.85714286 0.61904762 0.66666667] mean value: 0.7162337662337662 key: train_recall value: [0.89637306 0.89119171 0.9015544 0.86528497 0.88082902 0.91752577 0.89175258 0.89175258 0.90721649 0.88659794] mean value: 0.8930078521446504 key: test_roc_auc value: [0.72077922 0.58008658 0.60606061 0.86038961 0.76623377 0.58116883 0.83766234 0.74675325 0.67316017 0.74242424] mean value: 0.7114718614718615 key: train_roc_auc value: [0.89406282 0.90178142 0.90696277 0.88882805 0.87855884 0.93285615 0.90960686 0.88110945 0.9095668 0.88889482] mean value: 0.8992227979274612 key: test_jcc value: [0.57142857 0.4375 0.4137931 0.76 0.64285714 0.4 0.72 0.62068966 0.48148148 0.56 ] mean value: 0.5607749954387885 key: train_jcc value: [0.80841121 0.81904762 0.82857143 0.7952381 0.78341014 0.87254902 0.83173077 0.78995434 0.83412322 0.8 ] mean value: 0.8163035845546233 MCC on Blind test: 0.4 Accuracy on Blind test: 0.7 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01479411 0.01501846 0.01490593 0.01448727 0.0149529 0.01491094 0.01492023 0.01497388 0.01489806 0.01489973] mean value: 0.014876151084899902 key: score_time value: [0.0127027 0.01303649 0.01297402 0.01274824 0.01299381 0.01302719 0.01308703 0.0130887 0.01266432 0.01305079] mean value: 0.012937331199645996 key: test_mcc value: [ 0.68193178 -0.03178209 0.53463203 0.49456394 0.35868355 0.44155844 0.73471273 0.29669666 0.40939224 0.20995671] mean value: 0.4130345986091948 key: train_mcc value: [0.39290214 0.48160516 0.42033642 0.41972722 0.41370285 0.42376414 0.39848498 0.45362978 0.44905556 0.4451171 ] mean value: 0.42983253533526955 key: test_accuracy value: [0.8372093 0.48837209 0.76744186 0.74418605 0.6744186 0.72093023 0.86046512 0.62790698 0.69767442 0.60465116] mean value: 0.7023255813953488 key: train_accuracy value: [0.69509044 0.73901809 0.70801034 0.70801034 0.70542636 0.71059432 0.69767442 0.72609819 0.72351421 0.72093023] mean value: 0.7134366925064599 key: test_fscore value: [0.82926829 0.56 0.77272727 0.73170732 0.72 0.71428571 0.86956522 0.69230769 0.72340426 0.60465116] mean value: 0.7217916924577928 key: train_fscore value: [0.71078431 0.75305623 0.72639225 0.72506083 0.72058824 0.72682927 0.71670702 0.73762376 0.73710074 0.73786408] mean value: 0.7292006730036351 key: test_precision value: [0.89473684 0.5 0.77272727 0.78947368 0.64285714 0.71428571 0.8 0.58064516 0.65384615 0.59090909] mean value: 0.6939481062231487 key: train_precision value: [0.6744186 0.71296296 0.68181818 0.68348624 0.68372093 0.68981481 0.67579909 0.70952381 0.70422535 0.69724771] mean value: 0.6913017687828286 key: test_recall value: [0.77272727 0.63636364 0.77272727 0.68181818 0.81818182 0.71428571 0.95238095 0.85714286 0.80952381 0.61904762] mean value: 0.7634199134199134 key: train_recall value: [0.75129534 0.79792746 0.77720207 0.77202073 0.76165803 0.76804124 0.7628866 0.76804124 0.77319588 0.78350515] mean value: 0.7715773730035789 key: test_roc_auc value: [0.83874459 0.48484848 0.76731602 0.745671 0.67099567 0.72077922 0.86255411 0.63311688 0.70021645 0.60497835] mean value: 0.702922077922078 key: train_roc_auc value: [0.6952353 0.73916992 0.70818867 0.70817531 0.70557128 0.71044549 0.69750548 0.72598953 0.7233855 0.72076812] mean value: 0.7134434592169222 key: test_jcc value: [0.70833333 0.38888889 0.62962963 0.57692308 0.5625 0.55555556 0.76923077 0.52941176 0.56666667 0.43333333] mean value: 0.5720473018267136 key: train_jcc value: [0.5513308 0.60392157 0.57034221 0.56870229 0.56321839 0.57088123 0.55849057 0.58431373 0.58365759 0.58461538] mean value: 0.573947374305626 MCC on Blind test: 0.32 Accuracy on Blind test: 0.67 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01928735 0.01799226 0.04592514 0.04086757 0.03325629 0.02434087 0.04326963 0.03842497 0.02026701 0.04748487] mean value: 0.03311159610748291 key: score_time value: [0.01329494 0.01605535 0.01757121 0.02583075 0.0133481 0.02770424 0.02780008 0.02710891 0.0231266 0.03068089] mean value: 0.022252106666564943 key: test_mcc value: [0.43082022 0.27790255 0.36986766 0.62964308 0.369787 0.25490741 0.60786632 0.39343507 0.15272164 0.63732414] mean value: 0.4124275077176357 key: train_mcc value: [0.51104387 0.53280469 0.5164767 0.72190175 0.52516542 0.78363736 0.47848443 0.60509569 0.43790144 0.72125289] mean value: 0.5833764240666351 key: test_accuracy value: [0.65116279 0.60465116 0.6744186 0.81395349 0.65116279 0.62790698 0.76744186 0.62790698 0.53488372 0.81395349] mean value: 0.6767441860465117 key: train_accuracy value: [0.71317829 0.72351421 0.71834625 0.85788114 0.71576227 0.89147287 0.68992248 0.78036176 0.66149871 0.86046512] mean value: 0.7612403100775194 key: test_fscore value: [0.48275862 0.71186441 0.73076923 0.82608696 0.73684211 0.6 0.80769231 0.72413793 0.66666667 0.78947368] mean value: 0.7076291909627427 key: train_fscore value: [0.60215054 0.78207739 0.77709611 0.86618005 0.77822581 0.88947368 0.76284585 0.81561822 0.74759152 0.86294416] mean value: 0.7884203340208182 key: test_precision value: [1. 0.56756757 0.63333333 0.79166667 0.6 0.63157895 0.67741935 0.56756757 0.51282051 0.88235294] mean value: 0.6864306891339249 key: train_precision value: [0.97674419 0.6442953 0.64189189 0.81651376 0.6369637 0.90860215 0.61858974 0.70411985 0.59692308 0.85 ] mean value: 0.7394643659027074 key: test_recall value: [0.31818182 0.95454545 0.86363636 0.86363636 0.95454545 0.57142857 1. 1. 0.95238095 0.71428571] mean value: 0.8192640692640693 key: train_recall value: [0.43523316 0.99481865 0.98445596 0.92227979 1. 0.87113402 0.99484536 0.96907216 1. 0.87628866] mean value: 0.9048127770952407 key: test_roc_auc value: [0.65909091 0.59632035 0.66991342 0.81277056 0.64393939 0.62662338 0.77272727 0.63636364 0.54437229 0.81168831] mean value: 0.6773809523809524 key: train_roc_auc value: [0.71246194 0.72421345 0.7190321 0.85804711 0.71649485 0.89152556 0.68913252 0.77987287 0.66062176 0.86042412] mean value: 0.7611826291330591 key: test_jcc value: [0.31818182 0.55263158 0.57575758 0.7037037 0.58333333 0.42857143 0.67741935 0.56756757 0.5 0.65217391] mean value: 0.5559340273944984 key: train_jcc value: [0.43076923 0.64214047 0.63545151 0.7639485 0.6369637 0.80094787 0.61661342 0.68864469 0.59692308 0.75892857] mean value: 0.6571331021062359 MCC on Blind test: 0.23 Accuracy on Blind test: 0.58 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.03413296 0.01754665 0.02072024 0.05363464 0.04663396 0.03693771 0.04613781 0.04253745 0.0207994 0.01882339] mean value: 0.033790421485900876 key: score_time value: [0.01302671 0.01284456 0.01339936 0.0210731 0.02140474 0.02065182 0.01973844 0.02583551 0.0124917 0.01213956] mean value: 0.01726055145263672 key: test_mcc value: [0.32463131 0.21473308 0.48807056 0.59541363 0.57954841 0.34848485 0.72077922 0.50266669 0.21351219 0.65153277] mean value: 0.46393727002357993 key: train_mcc value: [0.71514737 0.71603467 0.76258185 0.69513224 0.60480936 0.71916537 0.69587799 0.67108207 0.71601841 0.6521176 ] mean value: 0.6947966921662997 key: test_accuracy value: [0.65116279 0.60465116 0.74418605 0.79069767 0.76744186 0.6744186 0.86046512 0.69767442 0.60465116 0.81395349] mean value: 0.7209302325581395 key: train_accuracy value: [0.85529716 0.85271318 0.87855297 0.83204134 0.77260982 0.85788114 0.84754522 0.81912145 0.85788114 0.81395349] mean value: 0.8387596899224806 key: test_fscore value: [0.59459459 0.66666667 0.75555556 0.81632653 0.72222222 0.66666667 0.85714286 0.76363636 0.62222222 0.77777778] mean value: 0.7242811457097171 key: train_fscore value: [0.84615385 0.86396181 0.88508557 0.85327314 0.70860927 0.86486486 0.84432718 0.84375 0.85639687 0.78571429] mean value: 0.8352136837990035 key: test_precision value: [0.73333333 0.5862069 0.73913043 0.74074074 0.92857143 0.66666667 0.85714286 0.61764706 0.58333333 0.93333333] mean value: 0.7386106083279556 key: train_precision value: [0.9005848 0.80088496 0.83796296 0.756 0.98165138 0.82629108 0.86486486 0.74409449 0.86772487 0.92957746] mean value: 0.850963685556325 key: test_recall value: [0.5 0.77272727 0.77272727 0.90909091 0.59090909 0.66666667 0.85714286 1. 0.66666667 0.66666667] mean value: 0.7402597402597403 key: train_recall value: [0.79792746 0.93782383 0.93782383 0.97927461 0.55440415 0.90721649 0.82474227 0.9742268 0.84536082 0.68041237] mean value: 0.8439212648896961 key: test_roc_auc value: [0.6547619 0.60064935 0.74350649 0.78787879 0.77164502 0.67424242 0.86038961 0.70454545 0.60606061 0.81060606] mean value: 0.7214285714285714 key: train_roc_auc value: [0.8551493 0.85293254 0.87870573 0.83242081 0.77204743 0.85775333 0.84760429 0.81871962 0.85791357 0.81429945] mean value: 0.8387546071256877 key: test_jcc value: [0.42307692 0.5 0.60714286 0.68965517 0.56521739 0.5 0.75 0.61764706 0.4516129 0.63636364] mean value: 0.5740715942350894 key: train_jcc value: [0.73333333 0.7605042 0.79385965 0.74409449 0.54871795 0.76190476 0.73059361 0.72972973 0.74885845 0.64705882] mean value: 0.7198654991002161 MCC on Blind test: 0.33 Accuracy on Blind test: 0.67 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.21367502 0.21543741 0.25512004 0.2126348 0.27796173 0.21113563 0.21479535 0.21336484 0.21369362 0.21332502] mean value: 0.22411434650421141 key: score_time value: [0.02114868 0.02118158 0.02099514 0.02095175 0.02381182 0.02107692 0.02110982 0.02105737 0.02100515 0.02121067] mean value: 0.02135488986968994 key: test_mcc value: [0.3961039 0.48917749 0.64040632 0.58824786 0.82901914 0.3961039 0.81385281 0.65585036 0.44227524 0.73248017] mean value: 0.5983517174178976 key: train_mcc value: [0.95870837 0.92259409 0.9329309 0.95865605 0.92769958 0.94316543 0.93803254 0.94326318 0.96383644 0.94316391] mean value: 0.9432050491425249 key: test_accuracy value: [0.69767442 0.74418605 0.81395349 0.79069767 0.90697674 0.69767442 0.90697674 0.81395349 0.72093023 0.86046512] mean value: 0.7953488372093023 key: train_accuracy value: [0.97932817 0.96124031 0.96640827 0.97932817 0.96382429 0.97157623 0.96899225 0.97157623 0.98191214 0.97157623] mean value: 0.9715762273901809 key: test_fscore value: [0.69767442 0.74418605 0.8 0.7804878 0.9 0.69767442 0.9047619 0.83333333 0.7 0.84210526] mean value: 0.7900223189852111 key: train_fscore value: [0.97938144 0.96143959 0.96658098 0.97927461 0.96391753 0.97157623 0.96923077 0.97186701 0.98200514 0.97172237] mean value: 0.9716995656744147 key: test_precision value: [0.71428571 0.76190476 0.88888889 0.84210526 1. 0.68181818 0.9047619 0.74074074 0.73684211 0.94117647] mean value: 0.821252403140948 key: train_precision value: [0.97435897 0.95408163 0.95918367 0.97927461 0.95897436 0.97409326 0.96428571 0.96446701 0.97948718 0.96923077] mean value: 0.9677437183183256 key: test_recall value: [0.68181818 0.72727273 0.72727273 0.72727273 0.81818182 0.71428571 0.9047619 0.95238095 0.66666667 0.76190476] mean value: 0.7681818181818182 key: train_recall value: [0.98445596 0.96891192 0.97409326 0.97927461 0.96891192 0.96907216 0.9742268 0.97938144 0.98453608 0.9742268 ] mean value: 0.9757090967362855 key: test_roc_auc value: [0.69805195 0.74458874 0.81601732 0.79220779 0.90909091 0.69805195 0.90692641 0.81709957 0.71969697 0.85822511] mean value: 0.795995670995671 key: train_roc_auc value: [0.97934138 0.96126008 0.96642808 0.97932803 0.9638374 0.97158271 0.96897869 0.97155601 0.98190535 0.97156936] mean value: 0.971578708402329 key: test_jcc value: [0.53571429 0.59259259 0.66666667 0.64 0.81818182 0.53571429 0.82608696 0.71428571 0.53846154 0.72727273] mean value: 0.6594976585411368 key: train_jcc value: [0.95959596 0.92574257 0.93532338 0.95939086 0.93034826 0.94472362 0.94029851 0.94527363 0.96464646 0.945 ] mean value: 0.9450343260628992 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.08050752 0.08281493 0.08780646 0.12561631 0.10441041 0.08479404 0.14258456 0.13446379 0.12838745 0.14794612] mean value: 0.11193315982818604 key: score_time value: [0.02324915 0.02340794 0.03211308 0.02946973 0.02444458 0.02654386 0.09909654 0.02820015 0.02767539 0.02461576] mean value: 0.033881616592407224 key: test_mcc value: [0.51986413 0.20995671 0.58824786 0.59970431 0.86929961 0.5421681 0.86117339 0.67462198 0.58134627 0.723327 ] mean value: 0.6169709354435585 key: train_mcc value: [0.96393847 0.96445208 0.94912625 0.95350142 0.96945581 0.98461498 0.94418052 0.9485255 0.93818785 0.96414836] mean value: 0.958013121376555 key: test_accuracy value: [0.74418605 0.60465116 0.79069767 0.79069767 0.93023256 0.76744186 0.93023256 0.8372093 0.79069767 0.86046512] mean value: 0.8046511627906977 key: train_accuracy value: [0.98191214 0.98191214 0.97416021 0.97674419 0.98449612 0.99224806 0.97157623 0.97416021 0.96899225 0.98191214] mean value: 0.9788113695090439 key: test_fscore value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] [0.7027027 0.60465116 0.7804878 0.76923077 0.92682927 0.73684211 0.92682927 0.82926829 0.7804878 0.85 ] mean value: 0.7907329179011718 key: train_fscore value: [0.98172324 0.98153034 0.97354497 0.97674419 0.98421053 0.99220779 0.97097625 0.97395833 0.96875 0.98172324] mean value: 0.9785368882950292 key: test_precision value: [0.86666667 0.61904762 0.84210526 0.88235294 1. 0.82352941 0.95 0.85 0.8 0.89473684] mean value: 0.852843874391862 key: train_precision value: [0.98947368 1. 0.99459459 0.9742268 1. 1. 0.99459459 0.98421053 0.97894737 0.99470899] mean value: 0.9910756566969263 key: test_recall value: [0.59090909 0.59090909 0.72727273 0.68181818 0.86363636 0.66666667 0.9047619 0.80952381 0.76190476 0.80952381] mean value: 0.7406926406926407 key: train_recall value: [0.97409326 0.96373057 0.95336788 0.97927461 0.96891192 0.98453608 0.94845361 0.96391753 0.95876289 0.96907216] mean value: 0.9664120506383206 key: test_roc_auc value: [0.7478355 0.60497835 0.79220779 0.79329004 0.93181818 0.76515152 0.92965368 0.83658009 0.79004329 0.85930736] mean value: 0.8050865800865801 key: train_roc_auc value: [0.98189199 0.98186528 0.97410662 0.97675071 0.98445596 0.99226804 0.97163613 0.97418674 0.96901875 0.98194541] mean value: 0.9788125634314406 key: test_jcc value: [0.54166667 0.43333333 0.64 0.625 0.86363636 0.58333333 0.86363636 0.70833333 0.64 0.73913043] mean value: 0.6638069828722003 key: train_jcc value: [0.96410256 0.96373057 0.94845361 0.95454545 0.96891192 0.98453608 0.94358974 0.94923858 0.93939394 0.96410256] mean value: 0.9580605022182751 MCC on Blind test: 0.54 Accuracy on Blind test: 0.76 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.23403645 0.16865182 0.19663453 0.22662783 0.18694973 0.22256374 0.20538998 0.18926692 0.21955013 0.21190357] mean value: 0.20615746974945068 key: score_time value: [0.0319984 0.0327487 0.03266263 0.03267264 0.03217125 0.03329086 0.03301549 0.03223681 0.03828764 0.04429054] mean value: 0.03433749675750732 key: test_mcc value: [0.3961039 0.2567 0.53463203 0.44468651 0.58225108 0.0287681 0.44155844 0.30151915 0.25541126 0.45629995] mean value: 0.3697930425744557 key: train_mcc value: [0.96919751 0.96393847 0.96919751 0.97937979 0.97427611 0.97417339 0.97932803 0.97427816 0.96920078 0.97427816] mean value: 0.9727247919177897 key: test_accuracy value: [0.69767442 0.62790698 0.76744186 0.72093023 0.79069767 0.51162791 0.72093023 0.65116279 0.62790698 0.72093023] mean value: 0.6837209302325581 key: train_accuracy value: [0.98449612 0.98191214 0.98449612 0.98966408 0.9870801 0.9870801 0.98966408 0.9870801 0.98449612 0.9870801 ] mean value: 0.9863049095607235 key: test_fscore value: [0.69767442 0.66666667 0.77272727 0.71428571 0.79069767 0.55319149 0.71428571 0.63414634 0.61904762 0.66666667] mean value: 0.6829389577528027 key: train_fscore value: [0.98429319 0.98172324 0.98429319 0.98958333 0.98694517 0.9870801 0.98969072 0.98701299 0.984375 0.98701299] mean value: 0.9862009927113224 key: test_precision value: [0.71428571 0.61538462 0.77272727 0.75 0.80952381 0.5 0.71428571 0.65 0.61904762 0.8 ] mean value: 0.6945254745254745 key: train_precision value: [0.99470899 0.98947368 0.99470899 0.9947644 0.99473684 0.98963731 0.98969072 0.9947644 0.99473684 0.9947644 ] mean value: 0.9931986578905286 key: test_recall value: [0.68181818 0.72727273 0.77272727 0.68181818 0.77272727 0.61904762 0.71428571 0.61904762 0.61904762 0.57142857] mean value: 0.6779220779220779 key: train_recall value: [0.97409326 0.97409326 0.97409326 0.98445596 0.97927461 0.98453608 0.98969072 0.97938144 0.9742268 0.97938144] mean value: 0.9793226857539661 key: test_roc_auc value: [0.69805195 0.62554113 0.76731602 0.72186147 0.79112554 0.51406926 0.72077922 0.6504329 0.62770563 0.71753247] mean value: 0.6834415584415584 key: train_roc_auc value: [0.98446931 0.98189199 0.98446931 0.98965066 0.98705999 0.98708669 0.98966401 0.98710005 0.98452273 0.98710005] mean value: 0.986301479621815 key: test_jcc value: [0.53571429 0.5 0.62962963 0.55555556 0.65384615 0.38235294 0.55555556 0.46428571 0.44827586 0.5 ] mean value: 0.522521569783233 key: train_jcc value: [0.96907216 0.96410256 0.96907216 0.97938144 0.9742268 0.9744898 0.97959184 0.97435897 0.96923077 0.97435897] mean value: 0.9727885492023931 MCC on Blind test: 0.27 Accuracy on Blind test: 0.64 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.80391073 0.82702971 0.87435555 0.93781734 0.8904283 0.84583449 0.83670449 0.85518622 0.98863912 0.9060173 ] mean value: 0.8765923261642456 key: score_time value: [0.01313496 0.01316261 0.01317477 0.01316571 0.01292896 0.01285672 0.02555585 0.0127821 0.02496505 0.0128355 ] mean value: 0.015456223487854004 key: test_mcc value: [0.63123793 0.44155844 0.81778934 0.55959928 0.86117339 0.58824786 0.9544491 0.76839827 0.53463203 0.86117339] mean value: 0.7018259039875132 key: train_mcc value: [1. 1. 1. 1. 1. 0.99484522 1. 1. 1. 0.99484522] mean value: 0.9989690447011859 key: test_accuracy value: [0.81395349 0.72093023 0.90697674 0.76744186 0.93023256 0.79069767 0.97674419 0.88372093 0.76744186 0.93023256] mean value: 0.8488372093023255 key: train_accuracy value: [1. 1. 1. 1. 1. 0.99741602 1. 1. 1. 0.99741602] mean value: 0.999483204134367 key: test_fscore value: [0.80952381 0.72727273 0.9047619 0.73684211 0.93333333 0.8 0.97560976 0.88372093 0.76190476 0.92682927] mean value: 0.8459798596682496 key: train_fscore value: [1. 1. 1. 1. 1. 0.99742931 1. 1. 1. 0.99742931] mean value: 0.9994858611825193 key: test_precision value: [0.85 0.72727273 0.95 0.875 0.91304348 0.75 1. 0.86363636 0.76190476 0.95 ] mean value: 0.8640857331074723 key: train_precision value: [1. 1. 1. 1. 1. 0.99487179 1. 1. 1. 0.99487179] mean value: 0.9989743589743589 key: test_recall value: [0.77272727 0.72727273 0.86363636 0.63636364 0.95454545 0.85714286 0.95238095 0.9047619 0.76190476 0.9047619 ] mean value: 0.8335497835497836 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.81493506 0.72077922 0.90800866 0.77056277 0.92965368 0.79220779 0.97619048 0.88419913 0.76731602 0.92965368] mean value: 0.8493506493506493 key: train_roc_auc value: [1. 1. 1. 1. 1. 0.99740933 1. 1. 1. 0.99740933] mean value: 0.9994818652849741 key: test_jcc value: [0.68 0.57142857 0.82608696 0.58333333 0.875 0.66666667 0.95238095 0.79166667 0.61538462 0.86363636] mean value: 0.7425584126018908 key: train_jcc value: [1. 1. 1. 1. 1. 0.99487179 1. 1. 1. 0.99487179] mean value: 0.9989743589743589 MCC on Blind test: 0.52 Accuracy on Blind test: 0.76 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.08959627 0.09128714 0.10529256 0.07038856 0.07038212 0.07437181 0.12109137 0.10772467 0.06410241 0.20232749] mean value: 0.09965643882751465 key: score_time value: [0.02013779 0.02048373 0.02002215 0.02515244 0.02193284 0.03954339 0.03085089 0.02007246 0.01990652 0.01039696] mean value: 0.0228499174118042 key: test_mcc value: [0.58225108 0.32531323 0.20824344 0.16726499 0.59541363 0.06638793 0.30151915 0.42224772 0.50454827 0.35748709] mean value: 0.35306765339248786 key: train_mcc value: [0.81354434 0.73174697 0.94878037 0.60464608 0.89211899 0.91973714 0.92458182 0.9276481 0.92278309 0.63983041] mean value: 0.8325417304390078 key: test_accuracy value: [0.79069767 0.62790698 0.60465116 0.58139535 0.79069767 0.53488372 0.65116279 0.69767442 0.74418605 0.6744186 ] mean value: 0.6697674418604651 key: train_accuracy value: [0.89922481 0.8501292 0.97416021 0.76744186 0.94315245 0.95865633 0.96124031 0.96382429 0.96124031 0.79069767] mean value: 0.9069767441860466 key: test_fscore value: [0.79069767 0.72413793 0.62222222 0.65384615 0.81632653 0.47368421 0.63414634 0.73469388 0.76595745 0.69565217] mean value: 0.6911364562396014 key: train_fscore value: [0.90780142 0.86877828 0.9744898 0.81092437 0.94607843 0.95721925 0.96 0.96391753 0.96183206 0.82729211] mean value: 0.9178333245074515 key: test_precision value: [0.80952381 0.58333333 0.60869565 0.56666667 0.74074074 0.52941176 0.65 0.64285714 0.69230769 0.64 ] mean value: 0.6463536802309181 key: train_precision value: [0.83478261 0.77108434 0.95979899 0.6819788 0.89767442 0.99444444 0.99447514 0.96391753 0.94974874 0.70545455] mean value: 0.8753359555723473 key: test_recall value: [0.77272727 0.95454545 0.63636364 0.77272727 0.90909091 0.42857143 0.61904762 0.85714286 0.85714286 0.76190476] mean value: 0.7569264069264069 key: train_recall value: [0.99481865 0.99481865 0.98963731 1. 1. 0.92268041 0.92783505 0.96391753 0.9742268 1. ] mean value: 0.9767934405213397 key: test_roc_auc value: [0.79112554 0.62012987 0.6038961 0.57683983 0.78787879 0.53246753 0.6504329 0.7012987 0.74675325 0.67640693] mean value: 0.6687229437229437 key: train_roc_auc value: [0.89947118 0.85050211 0.9742001 0.76804124 0.94329897 0.95874953 0.96132685 0.96382405 0.96120667 0.79015544] mean value: 0.9070776133753539 key: test_jcc value: [0.65384615 0.56756757 0.4516129 0.48571429 0.68965517 0.31034483 0.46428571 0.58064516 0.62068966 0.53333333] mean value: 0.5357694774435597 key: train_jcc value: [0.83116883 0.768 0.95024876 0.6819788 0.89767442 0.91794872 0.92307692 0.93034826 0.92647059 0.70545455] mean value: 0.8532369838000908 MCC on Blind test: 0.07 Accuracy on Blind test: 0.55 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.01539373 0.01527166 0.01535821 0.04245615 0.0374465 0.03744864 0.03409624 0.03687501 0.03697228 0.03376293] mean value: 0.030508136749267577 key: score_time value: [0.01220465 0.01212001 0.01203203 0.01598191 0.02313375 0.02380562 0.0238564 0.02015162 0.0226481 0.02359152] mean value: 0.018952560424804688 key: test_mcc value: [0.44155844 0.20835137 0.49456394 0.67532468 0.62964308 0.20824344 0.76789769 0.51986413 0.3961039 0.63732414] mean value: 0.49788747885318635 key: train_mcc value: [0.75720506 0.75208381 0.74703465 0.75718561 0.74177263 0.78294429 0.75193633 0.74677493 0.75193633 0.75734102] mean value: 0.7546214639709004 key: test_accuracy value: [0.72093023 0.60465116 0.74418605 0.8372093 0.81395349 0.60465116 0.88372093 0.74418605 0.69767442 0.81395349] mean value: 0.7465116279069768 key: train_accuracy value: [0.87855297 0.87596899 0.87338501 0.87855297 0.87080103 0.89147287 0.87596899 0.87338501 0.87596899 0.87855297] mean value: 0.8772609819121447 key: test_fscore value: [0.72727273 0.63829787 0.73170732 0.8372093 0.82608696 0.58536585 0.87804878 0.7755102 0.69767442 0.78947368] mean value: 0.7486647116576796 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_sl.py:156: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_sl.py:159: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.87917738 0.87434555 0.87468031 0.8772846 0.87179487 0.89175258 0.87628866 0.87403599 0.87628866 0.88040712] mean value: 0.8776055712937129 key: test_precision value: [0.72727273 0.6 0.78947368 0.85714286 0.79166667 0.6 0.9 0.67857143 0.68181818 0.88235294] mean value: 0.7508298486858859 key: train_precision value: [0.87244898 0.88359788 0.86363636 0.88421053 0.86294416 0.89175258 0.87628866 0.87179487 0.87628866 0.86934673] mean value: 0.8752309417948851 key: test_recall value: [0.72727273 0.68181818 0.68181818 0.81818182 0.86363636 0.57142857 0.85714286 0.9047619 0.71428571 0.71428571] mean value: 0.7534632034632035 key: train_recall value: [0.88601036 0.86528497 0.88601036 0.87046632 0.88082902 0.89175258 0.87628866 0.87628866 0.87628866 0.89175258] mean value: 0.8800972170290049 key: test_roc_auc value: [0.72077922 0.60281385 0.745671 0.83766234 0.81277056 0.6038961 0.88311688 0.7478355 0.69805195 0.81168831] mean value: 0.7464285714285714 key: train_roc_auc value: [0.87857219 0.87594146 0.87341755 0.87853213 0.87082688 0.89147214 0.87596816 0.87337749 0.87596816 0.87851878] mean value: 0.877259494685113 key: test_jcc value: [0.57142857 0.46875 0.57692308 0.72 0.7037037 0.4137931 0.7826087 0.63333333 0.53571429 0.65217391] mean value: 0.6058428683246899 key: train_jcc value: [0.78440367 0.77674419 0.77727273 0.78139535 0.77272727 0.80465116 0.77981651 0.77625571 0.77981651 0.78636364] mean value: 0.7819446739048318 MCC on Blind test: 0.4 Accuracy on Blind test: 0.7 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.20661426 0.38068581 0.32723403 0.37064242 0.38206387 0.34253049 0.34622693 0.36653638 0.34563804 0.40613103] mean value: 0.34743032455444334 key: score_time value: [0.02376938 0.02099776 0.02022552 0.02396584 0.0239017 0.02375197 0.0239141 0.02399564 0.02379417 0.02396965] mean value: 0.0232285737991333 key: test_mcc value: [0.3961039 0.4912706 0.49456394 0.72451364 0.723327 0.39479486 0.81385281 0.51986413 0.26318068 0.63732414] mean value: 0.5458795675605433 key: train_mcc value: [0.68476577 0.72095943 0.74703465 0.68476577 0.68053636 0.70564037 0.6434123 0.74677493 0.68517152 0.69551524] mean value: 0.6994576344704969 key: test_accuracy value: [0.69767442 0.74418605 0.74418605 0.86046512 0.86046512 0.69767442 0.90697674 0.74418605 0.62790698 0.81395349] mean value: 0.7697674418604651 key: train_accuracy value: [0.84237726 0.86046512 0.87338501 0.84237726 0.83979328 0.85271318 0.82170543 0.87338501 0.84237726 0.84754522] mean value: 0.8496124031007752 key: test_fscore value: [0.69767442 0.76595745 0.73170732 0.85714286 0.86956522 0.68292683 0.9047619 0.7755102 0.65217391 0.78947368] mean value: 0.7726893792386329 key: train_fscore value: [0.84237726 0.859375 0.87468031 0.84237726 0.84343434 0.85496183 0.82262211 0.87403599 0.84556962 0.85063291] mean value: 0.8510066633696552 key: test_precision value: [0.71428571 0.72 0.78947368 0.9 0.83333333 0.7 0.9047619 0.67857143 0.6 0.88235294] mean value: 0.7722779006339378 key: train_precision value: [0.84020619 0.86387435 0.86363636 0.84020619 0.8226601 0.84422111 0.82051282 0.87179487 0.83084577 0.8358209 ] mean value: 0.8433778643344287 key: test_recall value: [0.68181818 0.81818182 0.68181818 0.81818182 0.90909091 0.66666667 0.9047619 0.9047619 0.71428571 0.71428571] mean value: 0.7813852813852814 key: train_recall value: [0.84455959 0.85492228 0.88601036 0.84455959 0.86528497 0.86597938 0.82474227 0.87628866 0.86082474 0.86597938] mean value: 0.8589151220554457 key: test_roc_auc value: [0.69805195 0.74242424 0.745671 0.86147186 0.85930736 0.6969697 0.90692641 0.7478355 0.62987013 0.81168831] mean value: 0.770021645021645 key: train_roc_auc value: [0.84238289 0.86045083 0.87341755 0.84238289 0.83985898 0.85267881 0.82169756 0.87337749 0.84232947 0.84749746] mean value: 0.8496073927674803 key: test_jcc value: [0.53571429 0.62068966 0.57692308 0.75 0.76923077 0.51851852 0.82608696 0.63333333 0.48387097 0.65217391] mean value: 0.636654147619955 key: train_jcc value: [0.72767857 0.75342466 0.77727273 0.72767857 0.72925764 0.74666667 0.69868996 0.77625571 0.73245614 0.74008811] mean value: 0.7409468746424365 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.06141329 0.0454545 0.0357089 0.03677487 0.05627537 0.0584569 0.03606534 0.02807307 0.0350523 0.03440022] mean value: 0.04276747703552246 key: score_time value: [0.0240252 0.01387882 0.01382113 0.01195884 0.01689172 0.01412845 0.01413083 0.01187396 0.01419377 0.01422119] mean value: 0.01491239070892334 key: test_mcc value: [0.5007734 0.77777778 0.61059098 0.66229864 0.19802951 0.60130719 0.54754393 0.37340802 0.54458115 0.60130719] mean value: 0.5417617785378652 key: train_mcc value: [0.74779462 0.70891756 0.71609411 0.75395088 0.76030097 0.7350822 0.73501314 0.72244119 0.70982126 0.72885068] mean value: 0.7318266584515561 key: test_accuracy value: [0.75 0.88888889 0.8 0.82857143 0.6 0.8 0.77142857 0.68571429 0.77142857 0.8 ] mean value: 0.7696031746031746 key: train_accuracy value: [0.87341772 0.85443038 0.85804416 0.87697161 0.88012618 0.86750789 0.86750789 0.86119874 0.85488959 0.86435331] mean value: 0.8658447470350996 key: test_fscore value: [0.75675676 0.88888889 0.81081081 0.83333333 0.5625 0.8 0.76470588 0.71794872 0.78947368 0.8 ] mean value: 0.7724418074301975 key: train_fscore value: [0.87012987 0.85350318 0.85893417 0.87774295 0.88125 0.86708861 0.86708861 0.85987261 0.85350318 0.86520376] mean value: 0.865431694395441 key: test_precision value: [0.73684211 0.88888889 0.75 0.78947368 0.6 0.77777778 0.8125 0.66666667 0.75 0.82352941] mean value: 0.7595678534571724 key: train_precision value: [0.89333333 0.85897436 0.85625 0.875 0.8757764 0.87261146 0.86708861 0.86538462 0.85897436 0.85714286] mean value: 0.8680535993888141 key: test_recall value: [0.77777778 0.88888889 0.88235294 0.88235294 0.52941176 0.82352941 0.72222222 0.77777778 0.83333333 0.77777778] mean value: 0.7895424836601307 key: train_recall value: [0.84810127 0.84810127 0.86163522 0.88050314 0.88679245 0.86163522 0.86708861 0.85443038 0.84810127 0.87341772] mean value: 0.8629806544064963 key: test_roc_auc value: [0.75 0.88888889 0.80228758 0.83006536 0.59803922 0.80065359 0.77287582 0.68300654 0.76960784 0.80065359] mean value: 0.7696078431372549 key: train_roc_auc value: [0.87341772 0.85443038 0.8580328 0.87696043 0.88010509 0.86752647 0.86750657 0.86117745 0.85486824 0.86438182] mean value: 0.8658406973967041 key: test_jcc value: [0.60869565 0.8 0.68181818 0.71428571 0.39130435 0.66666667 0.61904762 0.56 0.65217391 0.66666667] mean value: 0.6360658761528327 key: train_jcc value: [0.77011494 0.74444444 0.75274725 0.78212291 0.7877095 0.76536313 0.76536313 0.75418994 0.74444444 0.76243094] mean value: 0.7628930626743352 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.96169853 1.06569672 0.93234968 1.18725181 0.91562414 0.9073813 1.05254364 1.10178089 1.07017517 1.11856723] mean value: 1.0313069105148316 key: score_time value: [0.01202536 0.01782393 0.01223016 0.02358055 0.0121913 0.012362 0.01212907 0.02116275 0.01963162 0.01449251] mean value: 0.015762925148010254 key: test_mcc value: [0.5007734 0.72333935 0.60130719 0.66229864 0.25671802 0.4869281 0.66009836 0.31354672 0.54458115 0.60130719] mean value: 0.5350898118358979 key: train_mcc value: [0.7215768 0.77862138 0.74143974 0.70984435 0.7350822 0.71609411 0.67853599 0.75395088 0.76032005 0.74788981] mean value: 0.7343355311093412 key: test_accuracy value: [0.75 0.86111111 0.8 0.82857143 0.62857143 0.74285714 0.82857143 0.65714286 0.77142857 0.8 ] mean value: 0.7668253968253969 key: train_accuracy value: [0.86075949 0.88924051 0.87066246 0.85488959 0.86750789 0.85804416 0.83911672 0.87697161 0.88012618 0.87381703] mean value: 0.8671135646687698 key: test_fscore value: [0.75675676 0.85714286 0.8 0.83333333 0.58064516 0.74285714 0.84210526 0.68421053 0.78947368 0.8 ] mean value: 0.7686524725064623 key: train_fscore value: [0.85987261 0.88817891 0.87227414 0.85443038 0.86708861 0.85893417 0.83601286 0.87619048 0.88050314 0.875 ] mean value: 0.8668485307706836 key: test_precision value: [0.73684211 0.88235294 0.77777778 0.78947368 0.64285714 0.72222222 0.8 0.65 0.75 0.82352941] mean value: 0.7575055285272003 key: train_precision value: [0.86538462 0.89677419 0.86419753 0.85987261 0.87261146 0.85625 0.8496732 0.87898089 0.875 0.86419753] mean value: 0.8682942041428643 key: test_recall value: [0.77777778 0.83333333 0.82352941 0.88235294 0.52941176 0.76470588 0.88888889 0.72222222 0.83333333 0.77777778] mean value: 0.7833333333333333 key: train_recall value: [0.85443038 0.87974684 0.88050314 0.8490566 0.86163522 0.86163522 0.82278481 0.87341772 0.88607595 0.88607595] mean value: 0.8655361834248866 key: test_roc_auc value: [0.75 0.86111111 0.80065359 0.83006536 0.62581699 0.74346405 0.82679739 0.65522876 0.76960784 0.80065359] mean value: 0.7663398692810457 key: train_roc_auc value: [0.86075949 0.88924051 0.87063132 0.85490805 0.86752647 0.8580328 0.83906536 0.87696043 0.88014489 0.87385558] mean value: 0.8671124910437067 key: test_jcc value: [0.60869565 0.75 0.66666667 0.71428571 0.40909091 0.59090909 0.72727273 0.52 0.65217391 0.66666667] mean value: 0.6305761340109166 key: train_jcc value: [0.75418994 0.79885057 0.77348066 0.74585635 0.76536313 0.75274725 0.71823204 0.77966102 0.78651685 0.77777778] mean value: 0.765267560951859 MCC on Blind test: 0.4 Accuracy on Blind test: 0.7 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01380801 0.00991035 0.00968146 0.00964808 0.00931549 0.00928283 0.00930619 0.00949693 0.00936675 0.00932574] mean value: 0.009914183616638183 key: score_time value: [0.01205993 0.00925589 0.00926042 0.00894856 0.00876904 0.00881219 0.00875783 0.00880146 0.00878906 0.00873899] mean value: 0.00921933650970459 key: test_mcc value: [0.23570226 0.35355339 0.34908996 0.5289947 0.03300492 0.39285636 0.57348878 0.38195106 0.38195106 0.42397369] mean value: 0.36545661866638957 key: train_mcc value: [0.37958125 0.37616279 0.40833467 0.38788612 0.40373937 0.39377873 0.37666364 0.38474063 0.41515065 0.37598307] mean value: 0.3902020926867929 key: test_accuracy value: [0.61111111 0.66666667 0.6 0.71428571 0.51428571 0.65714286 0.77142857 0.68571429 0.68571429 0.68571429] mean value: 0.6592063492063492 key: train_accuracy value: [0.67405063 0.67088608 0.68769716 0.67192429 0.68769716 0.67823344 0.67192429 0.67507886 0.68454259 0.67507886] mean value: 0.677711336501218 key: test_fscore value: [0.66666667 0.71428571 0.70833333 0.77272727 0.54054054 0.72727273 0.80952381 0.73170732 0.73170732 0.75555556] mean value: 0.7158320254051962 key: train_fscore value: [0.72823219 0.72774869 0.74015748 0.73469388 0.73740053 0.7357513 0.72631579 0.72965879 0.74226804 0.72386059] mean value: 0.7326087277953888 key: test_precision value: [0.58333333 0.625 0.5483871 0.62962963 0.5 0.59259259 0.70833333 0.65217391 0.65217391 0.62962963] mean value: 0.6121253441379668 key: train_precision value: [0.62443439 0.62053571 0.63513514 0.61802575 0.63761468 0.62555066 0.62162162 0.62331839 0.62608696 0.62790698] mean value: 0.6260230269863888 key: test_recall value: [0.77777778 0.83333333 1. 1. 0.58823529 0.94117647 0.94444444 0.83333333 0.83333333 0.94444444] mean value: 0.8696078431372549 key: train_recall value: [0.87341772 0.87974684 0.88679245 0.90566038 0.87421384 0.89308176 0.87341772 0.87974684 0.91139241 0.85443038] mean value: 0.8831900326407134 key: test_roc_auc value: [0.61111111 0.66666667 0.61111111 0.72222222 0.51633987 0.66503268 0.76633987 0.68137255 0.68137255 0.67810458] mean value: 0.6599673202614379 key: train_roc_auc value: [0.67405063 0.67088608 0.68706711 0.67118462 0.68710692 0.67755354 0.67255792 0.67572247 0.68525595 0.67564286] mean value: 0.6777028102858053 key: test_jcc value: [0.5 0.55555556 0.5483871 0.62962963 0.37037037 0.57142857 0.68 0.57692308 0.57692308 0.60714286] mean value: 0.5616360234747332 key: train_jcc value: [0.57261411 0.57201646 0.5875 0.58064516 0.58403361 0.58196721 0.57024793 0.57438017 0.59016393 0.56722689] mean value: 0.5780795480995707 MCC on Blind test: 0.4 Accuracy on Blind test: 0.7 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00951147 0.00954366 0.00955582 0.00958872 0.00967336 0.00975108 0.00970745 0.00974202 0.00979972 0.00995803] mean value: 0.00968313217163086 key: score_time value: [0.00868154 0.00875759 0.00878048 0.00882292 0.00884938 0.00890732 0.00891852 0.00897455 0.00908208 0.00913715] mean value: 0.008891153335571288 key: test_mcc value: [0.16903085 0.5007734 0.7261082 0.56011203 0.14098436 0.21004201 0.71475794 0.31372549 0.25816993 0.37049379] mean value: 0.39641980149576833 key: train_mcc value: [0.50009015 0.48116688 0.48292914 0.50199282 0.51419131 0.52757592 0.48983547 0.48983547 0.49635204 0.49546107] mean value: 0.4979430278221952 key: test_accuracy value: [0.58333333 0.75 0.85714286 0.77142857 0.57142857 0.6 0.85714286 0.65714286 0.62857143 0.68571429] mean value: 0.6961904761904761 key: train_accuracy value: [0.75 0.74050633 0.74132492 0.75078864 0.75709779 0.76340694 0.7444795 0.7444795 0.74763407 0.74763407] mean value: 0.7487351754981432 key: test_fscore value: [0.54545455 0.74285714 0.86486486 0.78947368 0.54545455 0.63157895 0.86486486 0.66666667 0.62857143 0.7027027 ] mean value: 0.6982489393015708 key: train_fscore value: [0.7523511 0.73717949 0.74691358 0.75692308 0.75862069 0.7706422 0.75076923 0.75076923 0.75460123 0.75 ] mean value: 0.7528769821550523 key: test_precision value: [0.6 0.76470588 0.8 0.71428571 0.5625 0.57142857 0.84210526 0.66666667 0.64705882 0.68421053] mean value: 0.6852961447736989 key: train_precision value: [0.74534161 0.74675325 0.73333333 0.74096386 0.75625 0.75 0.73053892 0.73053892 0.73214286 0.74074074] mean value: 0.7406603492610074 key: test_recall value: [0.5 0.72222222 0.94117647 0.88235294 0.52941176 0.70588235 0.88888889 0.66666667 0.61111111 0.72222222] mean value: 0.7169934640522876 key: train_recall value: [0.75949367 0.7278481 0.76100629 0.77358491 0.76100629 0.79245283 0.7721519 0.7721519 0.77848101 0.75949367] mean value: 0.7657670567629966 key: test_roc_auc value: [0.58333333 0.75 0.85947712 0.7745098 0.57026144 0.60294118 0.85620915 0.65686275 0.62908497 0.68464052] mean value: 0.6967320261437908 key: train_roc_auc value: [0.75 0.74050633 0.74126264 0.7507165 0.75708542 0.76331502 0.74456652 0.74456652 0.74773107 0.74767136] mean value: 0.74874213836478 key: test_jcc value: [0.375 0.59090909 0.76190476 0.65217391 0.375 0.46153846 0.76190476 0.5 0.45833333 0.54166667] mean value: 0.5478430989300554 key: train_jcc value: [0.60301508 0.58375635 0.59605911 0.60891089 0.61111111 0.62686567 0.60098522 0.60098522 0.60591133 0.6 ] mean value: 0.6037599981096068 MCC on Blind test: 0.48 Accuracy on Blind test: 0.73 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00937033 0.0111382 0.01077557 0.01083684 0.0093565 0.01025081 0.01027107 0.01103401 0.01049685 0.01045322] mean value: 0.010398340225219727 key: score_time value: [0.0183351 0.01748371 0.01819801 0.01933193 0.01758265 0.01811266 0.01768136 0.01772475 0.01783299 0.01692319] mean value: 0.017920637130737306 key: test_mcc value: [ 0.27820744 0.55555556 0.08852507 0.32673202 -0.09978902 0.44342203 0.42810458 0.31372549 -0.023338 0.50238608] mean value: 0.28135312415157965 key: train_mcc value: [0.50669459 0.47545552 0.53965039 0.50845145 0.53965039 0.50845145 0.45126865 0.5711897 0.54579834 0.48317837] mean value: 0.5129788851827025 key: test_accuracy value: [0.63888889 0.77777778 0.54285714 0.65714286 0.45714286 0.71428571 0.71428571 0.65714286 0.48571429 0.74285714] mean value: 0.6388095238095238 key: train_accuracy value: [0.75316456 0.73734177 0.76971609 0.75394322 0.76971609 0.75394322 0.72555205 0.78548896 0.77287066 0.74132492] mean value: 0.7563061534161243 key: test_fscore value: [0.62857143 0.77777778 0.55555556 0.68421053 0.34482759 0.73684211 0.72222222 0.66666667 0.4375 0.7804878 ] mean value: 0.6334661673457543 key: train_fscore value: [0.75776398 0.72964169 0.77399381 0.7607362 0.77399381 0.7607362 0.72025723 0.7875 0.77358491 0.73376623] mean value: 0.7571974051856762 key: test_precision value: [0.64705882 0.77777778 0.52631579 0.61904762 0.41666667 0.66666667 0.72222222 0.66666667 0.5 0.69565217] mean value: 0.6238074405963758 key: train_precision value: [0.74390244 0.75167785 0.76219512 0.74251497 0.76219512 0.74251497 0.73202614 0.77777778 0.76875 0.75333333] mean value: 0.7536887730297543 key: test_recall value: [0.61111111 0.77777778 0.58823529 0.76470588 0.29411765 0.82352941 0.72222222 0.66666667 0.38888889 0.88888889] mean value: 0.6526143790849673 key: train_recall value: [0.7721519 0.70886076 0.78616352 0.77987421 0.78616352 0.77987421 0.70886076 0.79746835 0.77848101 0.71518987] mean value: 0.7613088129925961 key: test_roc_auc value: [0.63888889 0.77777778 0.54411765 0.66013072 0.45261438 0.71732026 0.71405229 0.65686275 0.48856209 0.73856209] mean value: 0.6388888888888888 key: train_roc_auc value: [0.75316456 0.73734177 0.76966404 0.75386116 0.76966404 0.75386116 0.72549956 0.78552663 0.77288831 0.74124274] mean value: 0.7562713955895232 key: test_jcc value: [0.45833333 0.63636364 0.38461538 0.52 0.20833333 0.58333333 0.56521739 0.5 0.28 0.64 ] mean value: 0.4776196412283369 key: train_jcc value: [0.61 0.57435897 0.63131313 0.61386139 0.63131313 0.61386139 0.56281407 0.64948454 0.63076923 0.57948718] mean value: 0.6097263025953108 MCC on Blind test: 0.02 Accuracy on Blind test: 0.52 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01542187 0.01616073 0.01565242 0.01571083 0.01529074 0.01559758 0.01642108 0.01565695 0.01558185 0.01572609] mean value: 0.015722012519836424 key: score_time value: [0.0105958 0.01050568 0.0106566 0.01063514 0.01046419 0.0108273 0.01060271 0.01053166 0.01047659 0.01044893] mean value: 0.01057446002960205 key: test_mcc value: [0.4472136 0.66666667 0.5104265 0.7261082 0.14002801 0.5104265 0.71568627 0.42810458 0.42810458 0.54248366] mean value: 0.511524856256581 key: train_mcc value: [0.70908803 0.65907322 0.71619687 0.69720133 0.72244119 0.68473245 0.69085626 0.70357543 0.70348698 0.70361082] mean value: 0.6990262571392772 key: test_accuracy value: [0.72222222 0.83333333 0.74285714 0.85714286 0.57142857 0.74285714 0.85714286 0.71428571 0.71428571 0.77142857] mean value: 0.7526984126984126 key: train_accuracy value: [0.85443038 0.82911392 0.85804416 0.84858044 0.86119874 0.84227129 0.84542587 0.85173502 0.85173502 0.85173502] mean value: 0.8494269855847941 key: test_fscore value: [0.73684211 0.83333333 0.76923077 0.86486486 0.51612903 0.76923077 0.85714286 0.72222222 0.72222222 0.77777778] mean value: 0.7568995953546038 key: train_fscore value: [0.8525641 0.82467532 0.85981308 0.85 0.8625 0.8447205 0.84444444 0.84984026 0.85173502 0.85266458] mean value: 0.8492957300856865 key: test_precision value: [0.7 0.83333333 0.68181818 0.8 0.57142857 0.68181818 0.88235294 0.72222222 0.72222222 0.77777778] mean value: 0.7372973431796961 key: train_precision value: [0.86363636 0.84666667 0.85185185 0.8447205 0.85714286 0.83435583 0.84713376 0.85806452 0.8490566 0.8447205 ] mean value: 0.8497349439171819 key: test_recall value: [0.77777778 0.83333333 0.88235294 0.94117647 0.47058824 0.88235294 0.83333333 0.72222222 0.72222222 0.77777778] mean value: 0.7843137254901961 key: train_recall value: [0.84177215 0.80379747 0.86792453 0.85534591 0.86792453 0.85534591 0.84177215 0.84177215 0.85443038 0.86075949] mean value: 0.84908446779715 key: test_roc_auc value: [0.72222222 0.83333333 0.74673203 0.85947712 0.56862745 0.74673203 0.85784314 0.71405229 0.71405229 0.77124183] mean value: 0.7534313725490196 key: train_roc_auc value: [0.85443038 0.82911392 0.8580129 0.84855903 0.86117745 0.84222992 0.84541438 0.85170369 0.85174349 0.85176339] mean value: 0.849414855505135 key: test_jcc value: [0.58333333 0.71428571 0.625 0.76190476 0.34782609 0.625 0.75 0.56521739 0.56521739 0.63636364] mean value: 0.6174148315452663 key: train_jcc value: [0.74301676 0.70165746 0.75409836 0.73913043 0.75824176 0.7311828 0.73076923 0.73888889 0.74175824 0.7431694 ] mean value: 0.7381913328042566 MCC on Blind test: 0.27 Accuracy on Blind test: 0.64 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.67205691 1.74800611 1.71937871 1.71159053 1.55831695 1.38682938 1.66299558 1.92333603 1.57990313 1.74328637] mean value: 1.6705699682235717 key: score_time value: [0.01249242 0.01816964 0.01529384 0.01299953 0.01308084 0.01252365 0.01541805 0.01910162 0.01590967 0.01836467] mean value: 0.015335392951965333 key: test_mcc value: [0.50709255 0.77777778 0.54754393 0.61059098 0.20406349 0.66229864 0.4869281 0.43605973 0.42810458 0.65686275] mean value: 0.5317322530882816 key: train_mcc value: [0.94320801 0.96837383 0.96847259 0.98738158 0.96847385 0.98738158 0.96222284 0.97476316 0.98746069 0.98109152] mean value: 0.9728829641992514 key: test_accuracy value: [0.75 0.88888889 0.77142857 0.8 0.6 0.82857143 0.74285714 0.71428571 0.71428571 0.82857143] mean value: 0.763888888888889 key: train_accuracy value: [0.97151899 0.98417722 0.98422713 0.99369085 0.98422713 0.99369085 0.98107256 0.9873817 0.99369085 0.99053628] mean value: 0.9864213552689374 key: test_fscore value: [0.72727273 0.88888889 0.77777778 0.81081081 0.5 0.83333333 0.74285714 0.75 0.72222222 0.83333333] mean value: 0.7586496236496236 key: train_fscore value: [0.97178683 0.98412698 0.98432602 0.99371069 0.98422713 0.99371069 0.98113208 0.98734177 0.99371069 0.99047619] mean value: 0.9864549079700586 key: test_precision value: [0.8 0.88888889 0.73684211 0.75 0.63636364 0.78947368 0.76470588 0.68181818 0.72222222 0.83333333] mean value: 0.7603647934452888 key: train_precision value: [0.96273292 0.98726115 0.98125 0.99371069 0.98734177 0.99371069 0.975 0.98734177 0.9875 0.99363057] mean value: 0.9849479566951478 key: test_recall value: [0.66666667 0.88888889 0.82352941 0.88235294 0.41176471 0.88235294 0.72222222 0.83333333 0.72222222 0.83333333] mean value: 0.7666666666666666 key: train_recall value: [0.98101266 0.98101266 0.98742138 0.99371069 0.98113208 0.99371069 0.98734177 0.98734177 1. 0.98734177] mean value: 0.9880025475678689 key: test_roc_auc value: [0.75 0.88888889 0.77287582 0.80228758 0.59477124 0.83006536 0.74346405 0.71078431 0.71405229 0.82843137] mean value: 0.7635620915032679 key: train_roc_auc value: [0.97151899 0.98417722 0.98421702 0.99369079 0.98423692 0.99369079 0.98109227 0.98738158 0.99371069 0.99052623] mean value: 0.9864242496616511 key: test_jcc value: [0.57142857 0.8 0.63636364 0.68181818 0.33333333 0.71428571 0.59090909 0.6 0.56521739 0.71428571] mean value: 0.620764163372859 key: train_jcc value: [0.94512195 0.96875 0.9691358 0.9875 0.9689441 0.9875 0.96296296 0.975 0.9875 0.98113208] mean value: 0.9733546891502192 MCC on Blind test: 0.17 Accuracy on Blind test: 0.58 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.03031135 0.02292132 0.02027917 0.02237177 0.02224588 0.02189922 0.02142787 0.02358723 0.01928711 0.02066112] mean value: 0.0224992036819458 key: score_time value: [0.01231074 0.00908279 0.00853205 0.00865054 0.00881147 0.00856233 0.00856781 0.00854087 0.0085125 0.00855494] mean value: 0.009012603759765625 key: test_mcc value: [0.55555556 0.68376346 0.67680204 0.54458115 0.48524851 0.62873728 0.5815291 0.38195106 0.50238608 0.37340802] mean value: 0.5413962266125969 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.77777778 0.83333333 0.82857143 0.77142857 0.74285714 0.8 0.77142857 0.68571429 0.74285714 0.68571429] mean value: 0.763968253968254 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.77777778 0.8125 0.84210526 0.75 0.72727273 0.82051282 0.73333333 0.73170732 0.7804878 0.71794872] mean value: 0.7693645761954492 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.77777778 0.92857143 0.76190476 0.8 0.75 0.72727273 0.91666667 0.65217391 0.69565217 0.66666667] mean value: 0.7676686115816551 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.77777778 0.72222222 0.94117647 0.70588235 0.70588235 0.94117647 0.61111111 0.83333333 0.88888889 0.77777778] mean value: 0.7905228758169934 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.77777778 0.83333333 0.83169935 0.76960784 0.74183007 0.80392157 0.77614379 0.68137255 0.73856209 0.68300654] mean value: 0.7637254901960785 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.63636364 0.68421053 0.72727273 0.6 0.57142857 0.69565217 0.57894737 0.57692308 0.64 0.56 ] mean value: 0.6270798080637897 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.4 Accuracy on Blind test: 0.7 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.10861707 0.10948396 0.10891843 0.11251521 0.10827303 0.10929155 0.10902286 0.10885835 0.10935378 0.10915399] mean value: 0.10934882164001465 key: score_time value: [0.01715016 0.01729298 0.01747084 0.01766753 0.01737213 0.01722693 0.01735616 0.01723003 0.01750684 0.01739144] mean value: 0.017366504669189452 key: test_mcc value: [0.23570226 0.66666667 0.37955656 0.67680204 0.31774895 0.3180345 0.4869281 0.42906394 0.19934641 0.48524851] mean value: 0.41950979267853006 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.61111111 0.83333333 0.68571429 0.82857143 0.65714286 0.65714286 0.74285714 0.71428571 0.6 0.74285714] mean value: 0.7073015873015873 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.53333333 0.83333333 0.7027027 0.84210526 0.6 0.66666667 0.74285714 0.73684211 0.61111111 0.75675676] mean value: 0.7025708415182099 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.66666667 0.83333333 0.65 0.76190476 0.69230769 0.63157895 0.76470588 0.7 0.61111111 0.73684211] mean value: 0.7048450500308086 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.44444444 0.83333333 0.76470588 0.94117647 0.52941176 0.70588235 0.72222222 0.77777778 0.61111111 0.77777778] mean value: 0.7107843137254902 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.61111111 0.83333333 0.6879085 0.83169935 0.65359477 0.65849673 0.74346405 0.7124183 0.5996732 0.74183007] mean value: 0.7073529411764706 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.36363636 0.71428571 0.54166667 0.72727273 0.42857143 0.5 0.59090909 0.58333333 0.44 0.60869565] mean value: 0.5498370976849238 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.28 Accuracy on Blind test: 0.64 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01006556 0.00981927 0.00941396 0.00939035 0.00940633 0.0093112 0.00944066 0.00934458 0.009516 0.00944734] mean value: 0.009515523910522461 key: score_time value: [0.00883698 0.00861788 0.00850415 0.0085454 0.00852942 0.00849962 0.00850987 0.00850892 0.00852394 0.00889754] mean value: 0.008597373962402344 key: test_mcc value: [-0.1118034 0.56980288 0.15549417 0.4869281 -0.26403934 0.31354672 0.14852213 0.25573908 -0.023338 0.37254902] mean value: 0.19034013612049247 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.44444444 0.77777778 0.57142857 0.74285714 0.37142857 0.65714286 0.57142857 0.62857143 0.48571429 0.68571429] mean value: 0.5936507936507937 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.47368421 0.8 0.61538462 0.74285714 0.3125 0.625 0.54545455 0.64864865 0.4375 0.68571429] mean value: 0.5886743448585554 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.45 0.72727273 0.54545455 0.72222222 0.33333333 0.66666667 0.6 0.63157895 0.5 0.70588235] mean value: 0.5882410795259092 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.5 0.88888889 0.70588235 0.76470588 0.29411765 0.58823529 0.5 0.66666667 0.38888889 0.66666667] mean value: 0.5964052287581699 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.44444444 0.77777778 0.5751634 0.74346405 0.36928105 0.65522876 0.57352941 0.62745098 0.48856209 0.68627451] mean value: 0.5941176470588235 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.31034483 0.66666667 0.44444444 0.59090909 0.18518519 0.45454545 0.375 0.48 0.28 0.52173913] mean value: 0.4308834799771831 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.55 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.51522064 1.57600856 1.55131364 1.6932795 1.68040347 1.81337404 1.73380113 1.72802973 1.81885767 1.71407819] mean value: 1.6824366569519043 key: score_time value: [0.08909416 0.08941221 0.0988338 0.09735966 0.1245513 0.09853697 0.10165143 0.09904146 0.13339996 0.09609699] mean value: 0.10279779434204102 key: test_mcc value: [0.38949042 0.61205637 0.4869281 0.65686275 0.19802951 0.54248366 0.54754393 0.37049379 0.37340802 0.54248366] mean value: 0.4719780213530567 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.69444444 0.80555556 0.74285714 0.82857143 0.6 0.77142857 0.77142857 0.68571429 0.68571429 0.77142857] mean value: 0.7357142857142858 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.68571429 0.8 0.74285714 0.82352941 0.5625 0.76470588 0.76470588 0.7027027 0.71794872 0.77777778] mean value: 0.7342441803471215 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.70588235 0.82352941 0.72222222 0.82352941 0.6 0.76470588 0.8125 0.68421053 0.66666667 0.77777778] mean value: 0.7381024251805985 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.77777778 0.76470588 0.82352941 0.52941176 0.76470588 0.72222222 0.72222222 0.77777778 0.77777778] mean value: 0.7326797385620915 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.69444444 0.80555556 0.74346405 0.82843137 0.59803922 0.77124183 0.77287582 0.68464052 0.68300654 0.77124183] mean value: 0.7352941176470588 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.52173913 0.66666667 0.59090909 0.7 0.39130435 0.61904762 0.61904762 0.54166667 0.56 0.63636364] mean value: 0.5846744776962168 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.94859123 0.94930005 1.05498004 1.21180391 1.18700242 1.12215519 1.13519335 1.14038348 1.32968426 1.12452793] mean value: 1.1203621864318847 key: score_time value: [0.19991851 0.142133 0.15047169 0.15805769 0.15392733 0.15069294 0.15002155 0.16039896 0.16268349 0.15586495] mean value: 0.158417010307312 key: test_mcc value: [0.4472136 0.67082039 0.56011203 0.65686275 0.0825123 0.54248366 0.7261082 0.42906394 0.66009836 0.60000322] mean value: 0.5375278443260683 key: train_mcc value: [0.8734877 0.87341772 0.88014012 0.89276332 0.8864342 0.86119736 0.88014012 0.87389037 0.88014489 0.87381578] mean value: 0.8775431586883564 key: test_accuracy value: [0.72222222 0.83333333 0.77142857 0.82857143 0.54285714 0.77142857 0.85714286 0.71428571 0.82857143 0.8 ] mean value: 0.766984126984127 key: train_accuracy value: [0.93670886 0.93670886 0.94006309 0.94637224 0.94321767 0.93059937 0.94006309 0.93690852 0.94006309 0.93690852] mean value: 0.9387613305115202 key: test_fscore value: [0.73684211 0.82352941 0.78947368 0.82352941 0.5 0.76470588 0.84848485 0.73684211 0.84210526 0.81081081] mean value: 0.7676323523072749 key: train_fscore value: [0.93630573 0.93670886 0.94043887 0.94637224 0.94339623 0.93081761 0.93968254 0.93710692 0.94006309 0.93670886] mean value: 0.9387600951106223 key: test_precision value: [0.7 0.875 0.71428571 0.82352941 0.53333333 0.76470588 0.93333333 0.7 0.8 0.78947368] mean value: 0.7633661359280555 key:/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( train_precision value: [0.94230769 0.93670886 0.9375 0.94936709 0.94339623 0.93081761 0.94267516 0.93125 0.93710692 0.93670886] mean value: 0.9387838416386924 key: test_recall value: [0.77777778 0.77777778 0.88235294 0.82352941 0.47058824 0.76470588 0.77777778 0.77777778 0.88888889 0.83333333] mean value: 0.7774509803921569 key: train_recall value: [0.93037975 0.93670886 0.94339623 0.94339623 0.94339623 0.93081761 0.93670886 0.94303797 0.94303797 0.93670886] mean value: 0.9387588567789189 key: test_roc_auc value: [0.72222222 0.83333333 0.7745098 0.82843137 0.54084967 0.77124183 0.85947712 0.7124183 0.82679739 0.79901961] mean value: 0.7668300653594771 key: train_roc_auc value: [0.93670886 0.93670886 0.94005254 0.94638166 0.9432171 0.93059868 0.94005254 0.93692779 0.94007245 0.93690789] mean value: 0.9387628373537139 key: test_jcc value: [0.58333333 0.7 0.65217391 0.7 0.33333333 0.61904762 0.73684211 0.58333333 0.72727273 0.68181818] mean value: 0.6317154546445164 key: train_jcc value: [0.88023952 0.88095238 0.88757396 0.89820359 0.89285714 0.87058824 0.88622754 0.8816568 0.88690476 0.88095238] mean value: 0.8846156329874189 MCC on Blind test: 0.51 Accuracy on Blind test: 0.76 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01165247 0.01152277 0.01164889 0.01150632 0.01147509 0.01155019 0.01157808 0.01161909 0.01158905 0.0117197 ] mean value: 0.011586165428161621 key: score_time value: [0.01032376 0.01035714 0.01048326 0.01028204 0.01026344 0.01024675 0.01024652 0.01032281 0.0105052 0.01020026] mean value: 0.010323119163513184 key: test_mcc value: [0.16903085 0.5007734 0.7261082 0.56011203 0.14098436 0.21004201 0.71475794 0.31372549 0.25816993 0.37049379] mean value: 0.39641980149576833 key: train_mcc value: [0.50009015 0.48116688 0.48292914 0.50199282 0.51419131 0.52757592 0.48983547 0.48983547 0.49635204 0.49546107] mean value: 0.4979430278221952 key: test_accuracy value: [0.58333333 0.75 0.85714286 0.77142857 0.57142857 0.6 0.85714286 0.65714286 0.62857143 0.68571429] mean value: 0.6961904761904761 key: train_accuracy value: [0.75 0.74050633 0.74132492 0.75078864 0.75709779 0.76340694 0.7444795 0.7444795 0.74763407 0.74763407] mean value: 0.7487351754981432 key: test_fscore value: [0.54545455 0.74285714 0.86486486 0.78947368 0.54545455 0.63157895 0.86486486 0.66666667 0.62857143 0.7027027 ] mean value: 0.6982489393015708 key: train_fscore value: [0.7523511 0.73717949 0.74691358 0.75692308 0.75862069 0.7706422 0.75076923 0.75076923 0.75460123 0.75 ] mean value: 0.7528769821550523 key: test_precision value: [0.6 0.76470588 0.8 0.71428571 0.5625 0.57142857 0.84210526 0.66666667 0.64705882 0.68421053] mean value: 0.6852961447736989 key: train_precision value: [0.74534161 0.74675325 0.73333333 0.74096386 0.75625 0.75 0.73053892 0.73053892 0.73214286 0.74074074] mean value: 0.7406603492610074 key: test_recall value: [0.5 0.72222222 0.94117647 0.88235294 0.52941176 0.70588235 0.88888889 0.66666667 0.61111111 0.72222222] mean value: 0.7169934640522876 key: train_recall value: [0.75949367 0.7278481 0.76100629 0.77358491 0.76100629 0.79245283 0.7721519 0.7721519 0.77848101 0.75949367] mean value: 0.7657670567629966 key: test_roc_auc value: [0.58333333 0.75 0.85947712 0.7745098 0.57026144 0.60294118 0.85620915 0.65686275 0.62908497 0.68464052] mean value: 0.6967320261437908 key: train_roc_auc value: [0.75 0.74050633 0.74126264 0.7507165 0.75708542 0.76331502 0.74456652 0.74456652 0.74773107 0.74767136] mean value: 0.74874213836478 key: test_jcc value: [0.375 0.59090909 0.76190476 0.65217391 0.375 0.46153846 0.76190476 0.5 0.45833333 0.54166667] mean value: 0.5478430989300554 key: train_jcc value: [0.60301508 0.58375635 0.59605911 0.60891089 0.61111111 0.62686567 0.60098522 0.60098522 0.60591133 0.6 ] mean value: 0.6037599981096068 MCC on Blind test: 0.48 Accuracy on Blind test: 0.73 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [1.83040285 0.68204451 0.69062304 1.52602482 4.06419945 2.20627093 1.37749624 2.01253963 6.30247664 2.33498812] mean value: 2.3027066230773925 key: score_time value: [0.01224995 0.01245737 0.01228404 0.02493405 0.01278758 0.01422763 0.01192284 0.05001879 0.01314044 0.01353979] mean value: 0.01775624752044678 key: test_mcc value: [0.61977979 0.68376346 0.72347804 0.54458115 0.42810458 0.66229864 0.67680204 0.42810458 0.7261082 0.66229864] mean value: 0.6155319110264079 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.80555556 0.83333333 0.85714286 0.77142857 0.71428571 0.82857143 0.82857143 0.71428571 0.85714286 0.82857143] mean value: 0.8038888888888889 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.82051282 0.8125 0.83870968 0.75 0.70588235 0.83333333 0.8125 0.72222222 0.84848485 0.82352941] mean value: 0.7967674666678463 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.76190476 0.92857143 0.92857143 0.8 0.70588235 0.78947368 0.92857143 0.72222222 0.93333333 0.875 ] mean value: 0.8373530640326307 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88888889 0.72222222 0.76470588 0.70588235 0.70588235 0.88235294 0.72222222 0.72222222 0.77777778 0.77777778] mean value: 0.7669934640522875 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.80555556 0.83333333 0.85457516 0.76960784 0.71405229 0.83006536 0.83169935 0.71405229 0.85947712 0.83006536] mean value: 0.8042483660130719 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.69565217 0.68421053 0.72222222 0.6 0.54545455 0.71428571 0.68421053 0.56521739 0.73684211 0.7 ] mean value: 0.664809520507461 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.63 Accuracy on Blind test: 0.82 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.05960512 0.07252216 0.08704185 0.09135747 0.08128071 0.08341789 0.07867599 0.08321714 0.08362532 0.11062169] mean value: 0.08313653469085694 key: score_time value: [0.02530026 0.02149534 0.01821113 0.02195978 0.02221489 0.02192664 0.02198982 0.02070355 0.0226922 0.02168036] mean value: 0.021817398071289063 key: test_mcc value: [0.40482045 0.55555556 0.54754393 0.5104265 0.02622965 0.54248366 0.37955656 0.31354672 0.20327978 0.54248366] mean value: 0.40259264696226943 key: train_mcc value: [0.81019149 0.81686482 0.80479562 0.81705278 0.84865252 0.80442658 0.81705278 0.79837556 0.79221096 0.8296712 ] mean value: 0.8139294318855579 key: test_accuracy value: [0.69444444 0.77777778 0.77142857 0.74285714 0.51428571 0.77142857 0.68571429 0.65714286 0.6 0.77142857] mean value: 0.6986507936507936 key: train_accuracy value: [0.90506329 0.90822785 0.9022082 0.90851735 0.92429022 0.9022082 0.90851735 0.89905363 0.89589905 0.9148265 ] mean value: 0.9068811643972368 key: test_fscore value: [0.73170732 0.77777778 0.77777778 0.76923077 0.48484848 0.76470588 0.66666667 0.68421053 0.58823529 0.77777778] mean value: 0.7022938273938802 key: train_fscore value: [0.9044586 0.90965732 0.90402477 0.90851735 0.92405063 0.90282132 0.90851735 0.9 0.89719626 0.9148265 ] mean value: 0.9074070097346472 key: test_precision value: [0.65217391 0.77777778 0.73684211 0.68181818 0.5 0.76470588 0.73333333 0.65 0.625 0.77777778] mean value: 0.6899428971366648 key: train_precision value: [0.91025641 0.89570552 0.8902439 0.91139241 0.92993631 0.9 0.90566038 0.88888889 0.88343558 0.91194969] mean value: 0.9027469079567659 key: test_recall value: [0.83333333 0.77777778 0.82352941 0.88235294 0.47058824 0.76470588 0.61111111 0.72222222 0.55555556 0.77777778] mean value: 0.7218954248366013 key: train_recall value: [0.89873418 0.92405063 0.91823899 0.90566038 0.91823899 0.90566038 0.91139241 0.91139241 0.91139241 0.91772152] mean value: 0.9122482286442162 key: test_roc_auc value: [0.69444444 0.77777778 0.77287582 0.74673203 0.5130719 0.77124183 0.6879085 0.65522876 0.60130719 0.77124183] mean value: 0.6991830065359477 key: train_roc_auc value: [0.90506329 0.90822785 0.90215747 0.90852639 0.92430937 0.90219728 0.90852639 0.89909243 0.89594777 0.9148356 ] mean value: 0.9068883846827482 key: test_jcc value: [0.57692308 0.63636364 0.63636364 0.625 0.32 0.61904762 0.5 0.52 0.41666667 0.63636364] mean value: 0.5486728271728272 key: train_jcc value: [0.8255814 0.83428571 0.82485876 0.83236994 0.85882353 0.82285714 0.83236994 0.81818182 0.81355932 0.84302326] mean value: 0.830591081938834 MCC on Blind test: 0.34 Accuracy on Blind test: 0.67 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.0136795 0.0100081 0.01195359 0.00942755 0.00932813 0.00924706 0.00931287 0.00954461 0.00923276 0.00932527] mean value: 0.01010594367980957 key: score_time value: [0.00931573 0.00924492 0.00878453 0.00859833 0.00871873 0.008672 0.00852799 0.00857353 0.00871491 0.00868726] mean value: 0.008783793449401856 key: test_mcc value: [0.2236068 0.5007734 0.57177187 0.62873728 0.08112739 0.54754393 0.60000322 0.25573908 0.4869281 0.34381054] mean value: 0.4240041618135283 key: train_mcc value: [0.45265187 0.43386975 0.408835 0.40836591 0.45654605 0.44214672 0.41583903 0.44859316 0.43587821 0.45329575] mean value: 0.4356021447618876 key: test_accuracy value: [0.61111111 0.75 0.74285714 0.8 0.54285714 0.77142857 0.8 0.62857143 0.74285714 0.65714286] mean value: 0.7046825396825397 key: train_accuracy value: [0.72468354 0.71518987 0.70347003 0.70347003 0.72555205 0.7192429 0.70662461 0.72239748 0.71608833 0.72555205] mean value: 0.7162270894062213 key: test_fscore value: [0.63157895 0.75675676 0.79069767 0.82051282 0.46666667 0.77777778 0.81081081 0.64864865 0.74285714 0.72727273] mean value: 0.7173579973090377 key: train_fscore value: [0.74029851 0.73214286 0.71856287 0.71686747 0.74635569 0.73746313 0.72072072 0.73809524 0.73214286 0.73716012] mean value: 0.731980945751615 key: test_precision value: [0.6 0.73684211 0.65384615 0.72727273 0.53846154 0.73684211 0.78947368 0.63157895 0.76470588 0.61538462] mean value: 0.6794407759423239 key: train_precision value: [0.70056497 0.69101124 0.68571429 0.68786127 0.69565217 0.69444444 0.68571429 0.69662921 0.69101124 0.70520231] mean value: 0.6933805430745759 key: test_recall value: [0.66666667 0.77777778 1. 0.94117647 0.41176471 0.82352941 0.83333333 0.66666667 0.72222222 0.88888889] mean value: 0.773202614379085 key: train_recall value: [0.78481013 0.77848101 0.75471698 0.74842767 0.80503145 0.78616352 0.75949367 0.78481013 0.77848101 0.7721519 ] mean value: 0.7752567470742775 key: test_roc_auc value: [0.61111111 0.75 0.75 0.80392157 0.53921569 0.77287582 0.79901961 0.62745098 0.74346405 0.6503268 ] mean value: 0.7047385620915033 key: train_roc_auc value: [0.72468354 0.71518987 0.70330786 0.70332776 0.72530053 0.71903113 0.70679086 0.72259374 0.71628453 0.72569859] mean value: 0.7162208422896267 key: test_jcc value: [0.46153846 0.60869565 0.65384615 0.69565217 0.30434783 0.63636364 0.68181818 0.48 0.59090909 0.57142857] mean value: 0.5684599748078009 key: train_jcc value: [0.58767773 0.57746479 0.56074766 0.55868545 0.59534884 0.58411215 0.56338028 0.58490566 0.57746479 0.58373206] mean value: 0.5773519398369844 MCC on Blind test: 0.14 Accuracy on Blind test: 0.58 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.0140748 0.01407027 0.01748443 0.01679134 0.01666498 0.01806664 0.01892018 0.01885033 0.02053499 0.01682544] mean value: 0.017228341102600096 key: score_time value: [0.00899243 0.01114273 0.011374 0.01192832 0.01193833 0.01202989 0.01195478 0.01200414 0.01204705 0.01194572] mean value: 0.011535739898681641 key: test_mcc value: [0.39440532 0.61977979 0.57177187 0.65792885 0.235008 0.5815291 0.37049379 0.31354672 0.36962466 0.54248366] mean value: 0.4656571762353827 key: train_mcc value: [0.75756235 0.67604194 0.6597786 0.5837075 0.52913772 0.68303764 0.70672566 0.74770113 0.56953939 0.64846186] mean value: 0.6561693780962766 key: test_accuracy value: [0.69444444 0.80555556 0.74285714 0.8 0.6 0.77142857 0.68571429 0.65714286 0.62857143 0.77142857] mean value: 0.7157142857142857 key: train_accuracy value: [0.87658228 0.83227848 0.81388013 0.76025237 0.7192429 0.83911672 0.85173502 0.87381703 0.74763407 0.82334385] mean value: 0.8137882841512598 key: test_fscore value: [0.66666667 0.82051282 0.79069767 0.82926829 0.66666667 0.8 0.7027027 0.68421053 0.73469388 0.77777778] mean value: 0.7473197005294976 key: train_fscore value: [0.86956522 0.84637681 0.83923706 0.80512821 0.78132678 0.84866469 0.85800604 0.87421384 0.79695431 0.81578947] mean value: 0.8335262428267585 key: test_precision value: [0.73333333 0.76190476 0.65384615 0.70833333 0.56 0.69565217 0.68421053 0.65 0.58064516 0.77777778] mean value: 0.6805703221714516 key: train_precision value: [0.92198582 0.78074866 0.74038462 0.67965368 0.64112903 0.80337079 0.82080925 0.86875 0.66525424 0.84931507] mean value: 0.7771401146853855 key: test_recall value: [0.61111111 0.88888889 1. 1. 0.82352941 0.94117647 0.72222222 0.72222222 1. 0.77777778] mean value: 0.8486928104575163 key: train_recall value: [0.82278481 0.92405063 0.96855346 0.98742138 1. 0.89937107 0.89873418 0.87974684 0.99367089 0.78481013] mean value: 0.9159143380304116 key: test_roc_auc value: [0.69444444 0.80555556 0.75 0.80555556 0.60620915 0.77614379 0.68464052 0.65522876 0.61764706 0.77124183] mean value: 0.7166666666666667 key: train_roc_auc value: [0.87658228 0.83227848 0.81339065 0.75953348 0.71835443 0.83892604 0.85188281 0.87383568 0.74840777 0.82322267] mean value: 0.8136414298224663 key: test_jcc value: [0.5 0.69565217 0.65384615 0.70833333 0.5 0.66666667 0.54166667 0.52 0.58064516 0.63636364] mean value: 0.6003173792079823 key: train_jcc value: [0.76923077 0.73366834 0.72300469 0.67381974 0.64112903 0.7371134 0.75132275 0.77653631 0.66244726 0.68888889] mean value: 0.7157161193028951 MCC on Blind test: 0.35 Accuracy on Blind test: 0.64 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01913738 0.0192678 0.0184958 0.02039719 0.01903605 0.01772308 0.01957345 0.0205121 0.01728296 0.01713586] mean value: 0.018856167793273926 key: score_time value: [0.00974441 0.01202059 0.01236725 0.01226926 0.01234198 0.01199961 0.01243424 0.01198888 0.01200747 0.01197147] mean value: 0.011914515495300293 key: test_mcc value: [0.40089186 0.66332496 0.46804587 0.51449576 0.19943817 0.70196412 0.67680204 0.14379085 0.31506302 0.31636434] mean value: 0.4400180983626669 key: train_mcc value: [0.47913671 0.66720064 0.54010335 0.49645901 0.77309659 0.59641779 0.66187375 0.73642815 0.31308494 0.56463751] mean value: 0.5828438444215096 key: test_accuracy value: [0.66666667 0.80555556 0.68571429 0.71428571 0.6 0.82857143 0.82857143 0.57142857 0.6 0.62857143] mean value: 0.692936507936508 key: train_accuracy value: [0.68670886 0.8164557 0.72870662 0.69716088 0.88643533 0.77917981 0.82334385 0.86750789 0.59305994 0.74132492] mean value: 0.7619883799864233 key: test_fscore value: [0.73913043 0.8372093 0.52173913 0.58333333 0.53333333 0.85 0.8125 0.57142857 0.72 0.72340426] mean value: 0.689207836095736 key: train_fscore value: [0.76144578 0.84153005 0.63247863 0.56756757 0.88819876 0.81283422 0.80141844 0.8627451 0.70880361 0.79396985] mean value: 0.7670992018926353 key: test_precision value: [0.60714286 0.72 1. 1. 0.61538462 0.73913043 0.92857143 0.58823529 0.5625 0.5862069 ] mean value: 0.7347171526550881 key: train_precision value: [0.61478599 0.74038462 0.98666667 1. 0.87730061 0.70697674 0.91129032 0.89189189 0.55087719 0.65833333] mean value: 0.7938507372740486 key: test_recall value: [0.94444444 1. 0.35294118 0.41176471 0.47058824 1. 0.72222222 0.55555556 1. 0.94444444] mean value: 0.7401960784313726 key: train_recall value: [1. 0.97468354 0.46540881 0.39622642 0.89937107 0.95597484 0.71518987 0.83544304 0.99367089 1. ] mean value: 0.8235968473847624 key: test_roc_auc value: [0.66666667 0.80555556 0.67647059 0.70588235 0.59640523 0.83333333 0.83169935 0.57189542 0.58823529 0.61928105] mean value: 0.6895424836601307 key: train_roc_auc value: [0.68670886 0.8164557 0.72953985 0.69811321 0.8863944 0.77862033 0.82300374 0.86740705 0.59431972 0.74213836] mean value: 0.7622701218055887 key: test_jcc value: [0.5862069 0.72 0.35294118 0.41176471 0.36363636 0.73913043 0.68421053 0.4 0.5625 0.56666667] mean value: 0.5387056770306093 key: train_jcc value: [0.61478599 0.72641509 0.4625 0.39622642 0.79888268 0.68468468 0.66863905 0.75862069 0.54895105 0.65833333] mean value: 0.6318038993094784 MCC on Blind test: 0.59 Accuracy on Blind test: 0.79 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.15341592 0.14559603 0.14337611 0.14296579 0.14716601 0.14489007 0.14460921 0.14914179 0.14565468 0.14958763] mean value: 0.14664032459259033 key: score_time value: [0.01507354 0.01512289 0.01575899 0.015625 0.01577687 0.01592374 0.01520467 0.01659775 0.01633358 0.01628613] mean value: 0.015770316123962402 key: test_mcc value: [0.4472136 0.61977979 0.66009836 0.83006536 0.25671802 0.54754393 0.66229864 0.31372549 0.49507377 0.60130719] mean value: 0.5433824138767602 key: train_mcc value: [0.95571534 0.94305686 0.94952631 0.94959991 0.98109152 0.94959991 0.943237 0.96862933 0.9498328 0.97484177] mean value: 0.9565130764088194 key: test_accuracy value: [0.72222222 0.80555556 0.82857143 0.91428571 0.62857143 0.77142857 0.82857143 0.65714286 0.74285714 0.8 ] mean value: 0.7699206349206349 key: train_accuracy value: [0.9778481 0.97151899 0.97476341 0.97476341 0.99053628 0.97476341 0.97160883 0.98422713 0.97476341 0.9873817 ] mean value: 0.9782174659585513 key: test_fscore value: [0.73684211 0.78787879 0.8125 0.91428571 0.58064516 0.77777778 0.82352941 0.66666667 0.72727273 0.8 ] mean value: 0.762739835219986 key: train_fscore value: [0.97791798 0.97142857 0.97484277 0.975 0.99059561 0.975 0.97160883 0.98432602 0.975 0.98742138] mean value: 0.9783141166346138 key: test_precision value: [0.7 0.86666667 0.86666667 0.88888889 0.64285714 0.73684211 0.875 0.66666667 0.8 0.82352941] mean value: 0.7867117548773895 key: train_precision value: [0.97484277 0.97452229 0.97484277 0.9689441 0.9875 0.9689441 0.96855346 0.97515528 0.96296296 0.98125 ] mean value: 0.9737517727928156 key: test_recall value: [0.77777778 0.72222222 0.76470588 0.94117647 0.52941176 0.82352941 0.77777778 0.66666667 0.66666667 0.77777778] mean value: 0.7447712418300654 key: train_recall value: [0.98101266 0.96835443 0.97484277 0.98113208 0.99371069 0.98113208 0.97468354 0.99367089 0.98734177 0.99367089] mean value: 0.9829551787278084 key: test_roc_auc value: [0.72222222 0.80555556 0.82679739 0.91503268 0.62581699 0.77287582 0.83006536 0.65686275 0.74509804 0.80065359] mean value: 0.7700980392156862 key: train_roc_auc value: [0.9778481 0.97151899 0.97476316 0.97474325 0.99052623 0.97474325 0.9716185 0.98425683 0.97480296 0.98740148] mean value: 0.9782222752965528 key: test_jcc value: [0.58333333 0.65 0.68421053 0.84210526 0.40909091 0.63636364 0.7 0.5 0.57142857 0.66666667] mean value: 0.62431989063568 key: train_jcc value: [0.95679012 0.94444444 0.95092025 0.95121951 0.98136646 0.95121951 0.94478528 0.9691358 0.95121951 0.97515528] mean value: 0.9576256167558563 MCC on Blind test: 0.64 Accuracy on Blind test: 0.82 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.05692339 0.08574867 0.07924795 0.07966113 0.06598711 0.07994103 0.08200693 0.07545042 0.05371451 0.06896377] mean value: 0.07276449203491211 key: score_time value: [0.02605581 0.02339387 0.0397501 0.0291419 0.02865005 0.03141642 0.01997232 0.02368736 0.01852202 0.02502227] mean value: 0.026561212539672852 key: test_mcc value: [0.61205637 0.68376346 0.60678804 0.42906394 0.42810458 0.60130719 0.67680204 0.37340802 0.42810458 0.66229864] mean value: 0.5501696847597503 key: train_mcc value: [0.94967147 0.96202532 0.98109227 0.96245424 0.96893923 0.97507568 0.93720141 0.97507176 0.9628305 0.95630718] mean value: 0.9630669051105084 key: test_accuracy value: [0.80555556 0.83333333 0.8 0.71428571 0.71428571 0.8 0.82857143 0.68571429 0.71428571 0.82857143] mean value: 0.7724603174603175 key: train_accuracy value: [0.97468354 0.98101266 0.99053628 0.98107256 0.98422713 0.9873817 0.96845426 0.9873817 0.98107256 0.97791798] mean value: 0.98137403665695 key: test_fscore value: [0.81081081 0.8125 0.77419355 0.6875 0.70588235 0.8 0.8125 0.71794872 0.72222222 0.82352941] mean value: 0.766708706407473 key: train_fscore value: [0.97435897 0.98101266 0.99053628 0.98089172 0.98402556 0.98726115 0.96794872 0.98717949 0.98064516 0.97749196] mean value: 0.9811351663370135 key: test_precision value: [0.78947368 0.92857143 0.85714286 0.73333333 0.70588235 0.77777778 0.92857143 0.66666667 0.72222222 0.875 ] mean value: 0.7984641751437417 key: train_precision value: [0.98701299 0.98101266 0.99367089 0.99354839 1. 1. 0.98051948 1. 1. 0.99346405] mean value: 0.9929228451220621 key: test_recall value: [0.83333333 0.72222222 0.70588235 0.64705882 0.70588235 0.82352941 0.72222222 0.77777778 0.72222222 0.77777778] mean value: 0.7437908496732026 key: train_recall value: [0.96202532 0.98101266 0.98742138 0.96855346 0.96855346 0.97484277 0.9556962 0.97468354 0.96202532 0.96202532] mean value: 0.969683942361277 key: test_roc_auc value: [0.80555556 0.83333333 0.79738562 0.7124183 0.71405229 0.80065359 0.83169935 0.68300654 0.71405229 0.83006536] mean value: 0.7722222222222223 key: train_roc_auc value: [0.97468354 0.98101266 0.99054613 0.98111217 0.98427673 0.98742138 0.96841414 0.98734177 0.98101266 0.977868 ] mean value: 0.9813689196720006 key: test_jcc value: [0.68181818 0.68421053 0.63157895 0.52380952 0.54545455 0.66666667 0.68421053 0.56 0.56521739 0.7 ] mean value: 0.6242966309053265 key: train_jcc value: [0.95 0.96273292 0.98125 0.9625 0.96855346 0.97484277 0.9378882 0.97468354 0.96202532 0.95597484] mean value: 0.9630451047954306 MCC on Blind test: 0.44 Accuracy on Blind test: 0.7 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.0508275 0.10509944 0.12613726 0.1361196 0.14902663 0.13211441 0.193712 0.17184043 0.16689801 0.1765039 ] mean value: 0.14082791805267333 key: score_time value: [0.02170348 0.02615643 0.02238178 0.02268672 0.04280734 0.02516341 0.04326391 0.04236627 0.03333473 0.03141975] mean value: 0.03112838268280029 key: test_mcc value: [0.23570226 0.55901699 0.21004201 0.48809353 0.19943817 0.43278921 0.54248366 0.25671802 0.08852507 0.43605973] mean value: 0.34488686499781873 key: train_mcc value: [0.97476164 0.98103231 0.97484177 0.98738158 0.98738158 0.98738158 0.98109152 0.98109152 0.98738158 0.99371044] mean value: 0.983605550432851 key: test_accuracy value: [0.61111111 0.77777778 0.6 0.71428571 0.6 0.71428571 0.77142857 0.62857143 0.54285714 0.71428571] mean value: 0.6674603174603174 key: train_accuracy value: [0.98734177 0.99050633 0.9873817 0.99369085 0.99369085 0.99369085 0.99053628 0.99053628 0.99369085 0.99684543] mean value: 0.9917911192748473 key: test_fscore value: [0.53333333 0.78947368 0.63157895 0.76190476 0.53333333 0.72222222 0.77777778 0.66666667 0.52941176 0.75 ] mean value: 0.6695702491522925 key: train_fscore value: [0.98726115 0.99047619 0.98734177 0.99371069 0.99371069 0.99371069 0.99047619 0.99047619 0.99367089 0.9968254 ] mean value: 0.991765984845033 key: test_precision value: [0.66666667 0.75 0.57142857 0.64 0.61538462 0.68421053 0.77777778 0.61904762 0.5625 0.68181818] mean value: 0.6568833958439222 key: train_precision value: [0.99358974 0.99363057 0.99363057 0.99371069 0.99371069 0.99371069 0.99363057 0.99363057 0.99367089 1. ] mean value: 0.9942914998131022 key: test_recall value: [0.44444444 0.83333333 0.70588235 0.94117647 0.47058824 0.76470588 0.77777778 0.72222222 0.5 0.83333333] mean value: 0.6993464052287581 key: train_recall value: [0.98101266 0.98734177 0.98113208 0.99371069 0.99371069 0.99371069 0.98734177 0.98734177 0.99367089 0.99367089] mean value: 0.9892643897778839 key: test_roc_auc value: [0.61111111 0.77777778 0.60294118 0.72058824 0.59640523 0.71568627 0.77124183 0.62581699 0.54411765 0.71078431] mean value: 0.6676470588235295 key: train_roc_auc value: [0.98734177 0.99050633 0.98740148 0.99369079 0.99369079 0.99369079 0.99052623 0.99052623 0.99369079 0.99683544] mean value: 0.9917900644853117 key: test_jcc value: [0.36363636 0.65217391 0.46153846 0.61538462 0.36363636 0.56521739 0.63636364 0.5 0.36 0.6 ] mean value: 0.5117950744907267 key: train_jcc value: [0.97484277 0.98113208 0.975 0.9875 0.9875 0.9875 0.98113208 0.98113208 0.98742138 0.99367089] mean value: 0.983683126343444 MCC on Blind test: 0.15 Accuracy on Blind test: 0.58 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.94957113 1.08033276 1.35335708 1.35058165 1.33091569 0.87500858 0.72606444 0.74691105 0.69290042 0.73771286] mean value: 0.9843355655670166 key: score_time value: [0.02747321 0.03321004 0.02398467 0.02301693 0.02378416 0.01277089 0.01287675 0.01277661 0.01323342 0.01306009] mean value: 0.019618678092956542 key: test_mcc value: [0.55901699 0.78262379 0.77561558 0.60678804 0.4869281 0.61059098 0.56011203 0.48524851 0.49507377 0.66229864] mean value: 0.6024296444327484 key: train_mcc value: [0.99369079 1. 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9993690788750604 key: test_accuracy value: [0.77777778 0.88888889 0.88571429 0.8 0.74285714 0.8 0.77142857 0.74285714 0.74285714 0.82857143] mean value: 0.7980952380952381 key: train_accuracy value: [0.99683544 1. 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9996835443037975 key: test_fscore value: [0.78947368 0.88235294 0.875 0.77419355 0.74285714 0.81081081 0.75 0.75675676 0.72727273 0.82352941] mean value: 0.7932247023236236 key: train_fscore value: [0.99684543 1. 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9996845425867508 key: test_precision value: [0.75 0.9375 0.93333333 0.85714286 0.72222222 0.75 0.85714286 0.73684211 0.8 0.875 ] mean value: 0.8219183375104427 key: train_precision value: [0.99371069 1. 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9993710691823899 key: test_recall value: [0.83333333 0.83333333 0.82352941 0.70588235 0.76470588 0.88235294 0.66666667 0.77777778 0.66666667 0.77777778] mean value: 0.773202614379085 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.77777778 0.88888889 0.88398693 0.79738562 0.74346405 0.80228758 0.7745098 0.74183007 0.74509804 0.83006536] mean value: 0.7985294117647058 key: train_roc_auc value: [0.99683544 1. 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9996835443037975 key: test_jcc value: [0.65217391 0.78947368 0.77777778 0.63157895 0.59090909 0.68181818 0.6 0.60869565 0.57142857 0.7 ] mean value: 0.660385581872996 key: train_jcc value: [0.99371069 1. 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9993710691823899 MCC on Blind test: 0.64 Accuracy on Blind test: 0.82 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.10395193 0.06345844 0.06561041 0.06429601 0.07124066 0.10163808 0.06904268 0.06744313 0.0675621 0.0681169 ] mean value: 0.07423603534698486 key: score_time value: [0.01719022 0.0237124 0.02569318 0.02966261 0.02562976 0.02059865 0.03447008 0.03131509 0.02619195 0.03064704] mean value: 0.026511096954345705 key: test_mcc value: [0.3354102 0.52048344 0.08496732 0.46109408 0.19934641 0.34299717 0.36155076 0.31354672 0.14002801 0.19943817] mean value: 0.2958862272765599 key: train_mcc value: [0.66201856 0.86922699 0.80344002 0.70968872 0.71101985 0.68098909 0.86956126 0.76778044 0.76244394 0.92021604] mean value: 0.7756384901709313 key: test_accuracy value: [0.66666667 0.75 0.54285714 0.71428571 0.6 0.62857143 0.65714286 0.65714286 0.57142857 0.6 ] mean value: 0.6388095238095238 key: train_accuracy value: [0.81329114 0.93037975 0.89589905 0.84227129 0.83596215 0.8170347 0.93059937 0.88012618 0.86750789 0.95899054] mean value: 0.8772062053268378 key: test_fscore value: [0.64705882 0.7804878 0.52941176 0.75 0.58823529 0.71111111 0.57142857 0.68421053 0.61538462 0.65 ] mean value: 0.6527328511471078 key: train_fscore value: [0.83923706 0.93491124 0.90434783 0.86111111 0.85945946 0.84574468 0.92517007 0.88757396 0.88268156 0.96024465] mean value: 0.8900481622420955 key: test_precision value: [0.6875 0.69565217 0.52941176 0.65217391 0.58823529 0.57142857 0.8 0.65 0.57142857 0.59090909] mean value: 0.6336739379546285 key: train_precision value: [0.73684211 0.87777778 0.83870968 0.77114428 0.7535545 0.73271889 1. 0.83333333 0.79 0.92899408] mean value: 0.8263074651619711 key: test_recall value: [0.61111111 0.88888889 0.52941176 0.88235294 0.58823529 0.94117647 0.44444444 0.72222222 0.66666667 0.72222222] mean value: 0.699673202614379 key: train_recall value: [0.97468354 1. 0.98113208 0.97484277 1. 1. 0.86075949 0.94936709 1. 0.99367089] mean value: 0.9734455855425523 key: test_roc_auc value: [0.66666667 0.75 0.54248366 0.71895425 0.5996732 0.6372549 0.66339869 0.65522876 0.56862745 0.59640523] mean value: 0.6398692810457517 key: train_roc_auc value: [0.81329114 0.93037975 0.89562933 0.84185176 0.83544304 0.8164557 0.93037975 0.88034392 0.86792453 0.95909959] mean value: 0.8770798503303877 key: test_jcc value: [0.47826087 0.64 0.36 0.6 0.41666667 0.55172414 0.4 0.52 0.44444444 0.48148148] mean value: 0.48925776000888443 key: train_jcc value: [0.72300469 0.87777778 0.82539683 0.75609756 0.7535545 0.73271889 0.86075949 0.79787234 0.79 0.92352941] mean value: 0.8040711501225902 MCC on Blind test: 0.14 Accuracy on Blind test: 0.58 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.04062843 0.0383327 0.04687071 0.03227305 0.05511498 0.07243466 0.05638766 0.04591179 0.05584884 0.05544066] mean value: 0.04992434978485107 key: score_time value: [0.02106357 0.02162671 0.03051853 0.05224371 0.03813624 0.03405857 0.03598046 0.02535057 0.03507161 0.03812766] mean value: 0.033217763900756835 key: test_mcc value: [0.39440532 0.61205637 0.60130719 0.66229864 0.19943817 0.61059098 0.42810458 0.19802951 0.54754393 0.54248366] mean value: 0.47962583485985044 key: train_mcc value: [0.77862138 0.75955453 0.74768104 0.74768104 0.79816076 0.72246328 0.75409053 0.7665698 0.74763156 0.7350822 ] mean value: 0.7557536128515295 key: test_accuracy value: [0.69444444 0.80555556 0.8 0.82857143 0.6 0.8 0.71428571 0.6 0.77142857 0.77142857] mean value: 0.7385714285714285 key: train_accuracy value: [0.88924051 0.87974684 0.87381703 0.87381703 0.89905363 0.86119874 0.87697161 0.88328076 0.87381703 0.86750789] mean value: 0.8778451064169628 key: test_fscore value: [0.71794872 0.8 0.8 0.83333333 0.53333333 0.81081081 0.72222222 0.63157895 0.76470588 0.77777778] mean value: 0.7391711025147557 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_sl.py:176: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_sl.py:179: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.88817891 0.87898089 0.875 0.875 0.9 0.86075949 0.87774295 0.88253968 0.87341772 0.86792453] mean value: 0.8779544178197671 key: test_precision value: [0.66666667 0.82352941 0.77777778 0.78947368 0.61538462 0.75 0.72222222 0.6 0.8125 0.77777778] mean value: 0.7335332155804292 key: train_precision value: [0.89677419 0.88461538 0.86956522 0.86956522 0.89440994 0.86624204 0.86956522 0.88535032 0.87341772 0.8625 ] mean value: 0.8772005246432769 key: test_recall value: [0.77777778 0.77777778 0.82352941 0.88235294 0.47058824 0.88235294 0.72222222 0.66666667 0.72222222 0.77777778] mean value: 0.7503267973856209 key: train_recall value: [0.87974684 0.87341772 0.88050314 0.88050314 0.90566038 0.85534591 0.88607595 0.87974684 0.87341772 0.87341772] mean value: 0.8787835363426479 key: test_roc_auc value: [0.69444444 0.80555556 0.80065359 0.83006536 0.59640523 0.80228758 0.71405229 0.59803922 0.77287582 0.77124183] mean value: 0.7385620915032679 key: train_roc_auc value: [0.88924051 0.87974684 0.87379588 0.87379588 0.89903272 0.86121726 0.87700024 0.88326964 0.87381578 0.86752647] mean value: 0.8778441206910278 key: test_jcc value: [0.56 0.66666667 0.66666667 0.71428571 0.36363636 0.68181818 0.56521739 0.46153846 0.61904762 0.63636364] mean value: 0.5935240701327658 key: train_jcc value: [0.79885057 0.78409091 0.77777778 0.77777778 0.81818182 0.75555556 0.78212291 0.78977273 0.7752809 0.76666667] mean value: 0.7826077610940213 MCC on Blind test: 0.34 Accuracy on Blind test: 0.67 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.33978271 0.42522049 0.4941678 0.48616838 0.46958041 0.43947339 0.49444461 0.43981433 0.48236465 0.43516088] mean value: 0.45061776638031004 key: score_time value: [0.04065251 0.03120565 0.05050755 0.03590274 0.04430676 0.03336024 0.03085423 0.04242754 0.03187752 0.04001975] mean value: 0.03811144828796387 key: test_mcc value: [0.4472136 0.77777778 0.61059098 0.66229864 0.19802951 0.61059098 0.54754393 0.19802951 0.66009836 0.65686275] mean value: 0.536903603559525 key: train_mcc value: [0.72175032 0.71531883 0.70348698 0.70977629 0.74133195 0.72239471 0.69715787 0.7665698 0.69715787 0.69740407] mean value: 0.7172348695720924 key: test_accuracy value: [0.72222222 0.88888889 0.8 0.82857143 0.6 0.8 0.77142857 0.6 0.82857143 0.82857143] mean value: 0.7668253968253969 key: train_accuracy value: [0.86075949 0.85759494 0.85173502 0.85488959 0.87066246 0.86119874 0.84858044 0.88328076 0.84858044 0.84858044] mean value: 0.8585862316815078 key: test_fscore value: [0.73684211 0.88888889 0.81081081 0.83333333 0.5625 0.81081081 0.76470588 0.63157895 0.84210526 0.83333333] mean value: 0.7714909375319592 key: train_fscore value: [0.85897436 0.85623003 0.85173502 0.85534591 0.87147335 0.86163522 0.84810127 0.88253968 0.84810127 0.85 ] mean value: 0.858413610718881 key: test_precision value: [0.7 0.88888889 0.75 0.78947368 0.6 0.75 0.8125 0.6 0.8 0.83333333] mean value: 0.7524195906432749 key: train_precision value: [0.87012987 0.86451613 0.85443038 0.85534591 0.86875 0.86163522 0.84810127 0.88535032 0.84810127 0.83950617] mean value: 0.8595866533940849 key: test_recall value: [0.77777778 0.88888889 0.88235294 0.88235294 0.52941176 0.88235294 0.72222222 0.66666667 0.88888889 0.83333333] mean value: 0.7954248366013071 key: train_recall value: [0.84810127 0.84810127 0.8490566 0.85534591 0.87421384 0.86163522 0.84810127 0.87974684 0.84810127 0.86075949] mean value: 0.8573162964732107 key: test_roc_auc value: [0.72222222 0.88888889 0.80228758 0.83006536 0.59803922 0.80228758 0.77287582 0.59803922 0.82679739 0.82843137] mean value: 0.7669934640522875 key: train_roc_auc value: [0.86075949 0.85759494 0.85174349 0.85488815 0.87065122 0.86119736 0.84857893 0.88326964 0.84857893 0.84861874] mean value: 0.8585880901202134 key: test_jcc value: [0.58333333 0.8 0.68181818 0.71428571 0.39130435 0.68181818 0.61904762 0.46153846 0.72727273 0.71428571] mean value: 0.637470428122602 key: train_jcc value: [0.75280899 0.74860335 0.74175824 0.74725275 0.77222222 0.75690608 0.73626374 0.78977273 0.73626374 0.73913043] mean value: 0.7520982263883438 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.09176397 0.11980963 0.11011243 0.14975119 0.13808537 0.16043615 0.1992569 0.11476517 0.09389091 0.11426997] mean value: 0.12921416759490967 key: score_time value: [0.01855183 0.03585911 0.02424717 0.02689886 0.04189634 0.04716086 0.04669476 0.05341172 0.04221559 0.02758384] mean value: 0.03645200729370117 key: test_mcc value: [0.63123793 0.50454827 0.72077922 0.48917749 0.58134627 0.58824786 0.76789769 0.72451364 0.21040933 0.40088002] mean value: 0.5619037712320794 key: train_mcc value: [0.72610972 0.7158578 0.67961217 0.71059238 0.7002858 0.71059238 0.68500216 0.71062262 0.74164686 0.74226246] mean value: 0.7122584348906272 key: test_accuracy value: [0.81395349 0.74418605 0.86046512 0.74418605 0.79069767 0.79069767 0.88372093 0.86046512 0.60465116 0.69767442] mean value: 0.7790697674418604 key: train_accuracy value: [0.8630491 0.85788114 0.83979328 0.85529716 0.8501292 0.85529716 0.84237726 0.85529716 0.87080103 0.87080103] mean value: 0.8560723514211886 key: test_fscore value: [0.81818182 0.76595745 0.85714286 0.74418605 0.7804878 0.7804878 0.88888889 0.85714286 0.65306122 0.68292683] mean value: 0.7828463578190746 key: train_fscore value: [0.8630491 0.85714286 0.84102564 0.8556701 0.85128205 0.85492228 0.84398977 0.85416667 0.87113402 0.87309645] mean value: 0.8565478931750017 key: test_precision value: [0.7826087 0.69230769 0.85714286 0.72727273 0.8 0.84210526 0.86956522 0.9 0.59259259 0.73684211] mean value: 0.78004371507804 key: train_precision value: [0.86528497 0.86387435 0.83673469 0.8556701 0.84693878 0.85492228 0.83333333 0.85863874 0.86666667 0.85572139] mean value: 0.853778530840661 key: test_recall value: [0.85714286 0.85714286 0.85714286 0.76190476 0.76190476 0.72727273 0.90909091 0.81818182 0.72727273 0.63636364] mean value: 0.7913419913419913 key: train_recall value: [0.86082474 0.85051546 0.84536082 0.8556701 0.8556701 0.85492228 0.85492228 0.84974093 0.87564767 0.89119171] mean value: 0.8594466107579724 key: test_roc_auc value: [0.81493506 0.74675325 0.86038961 0.74458874 0.79004329 0.79220779 0.88311688 0.86147186 0.6017316 0.6991342 ] mean value: 0.7794372294372295 key: train_roc_auc value: [0.86305486 0.85790022 0.83977886 0.85529619 0.85011484 0.85529619 0.84240959 0.85528284 0.87081352 0.87085359] mean value: 0.856080070509054 key: test_jcc value: [0.69230769 0.62068966 0.75 0.59259259 0.64 0.64 0.8 0.75 0.48484848 0.51851852] mean value: 0.6488956943439702 key: train_jcc value: [0.75909091 0.75 0.72566372 0.74774775 0.74107143 0.74660633 0.7300885 0.74545455 0.7716895 0.77477477] mean value: 0.749218745058731 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [2.05319595 2.25262332 2.25992036 1.86790586 1.25109744 1.45810485 1.15412784 1.81505632 1.78134704 1.90901518] mean value: 1.7802394151687622 key: score_time value: [0.02578926 0.02598405 0.02641463 0.01519704 0.01256347 0.01458597 0.01378655 0.02464509 0.02310419 0.01562047] mean value: 0.01976907253265381 key: test_mcc value: [0.58225108 0.49456394 0.72077922 0.48917749 0.53463203 0.58824786 0.77418983 0.72451364 0.30265778 0.25490741] mean value: 0.5465920274282123 key: train_mcc value: [0.63313389 0.67962928 0.65393972 0.67974678 0.63325946 0.67488854 0.65375781 0.67451054 0.70543774 0.63857648] mean value: 0.6626880247473126 key: test_accuracy value: [0.79069767 0.74418605 0.86046512 0.74418605 0.76744186 0.79069767 0.88372093 0.86046512 0.65116279 0.62790698] mean value: 0.772093023255814 key: train_accuracy value: [0.81653747 0.83979328 0.82687339 0.83979328 0.81653747 0.8372093 0.82687339 0.8372093 0.85271318 0.81912145] mean value: 0.831266149870801 key: test_fscore value: [0.79069767 0.75555556 0.85714286 0.74418605 0.76190476 0.7804878 0.89361702 0.85714286 0.68085106 0.65217391] mean value: 0.7773759555704174 key: train_fscore value: [0.81841432 0.83937824 0.82951654 0.83854167 0.81933842 0.83969466 0.82687339 0.83804627 0.85271318 0.82142857] mean value: 0.8323945252809524 key: test_precision value: [0.77272727 0.70833333 0.85714286 0.72727273 0.76190476 0.84210526 0.84 0.9 0.64 0.625 ] mean value: 0.7674486215538847 key: train_precision value: [0.81218274 0.84375 0.81909548 0.84736842 0.80904523 0.825 0.82474227 0.83163265 0.85051546 0.80904523] mean value: 0.8272377476837611 key: test_recall value: [0.80952381 0.80952381 0.85714286 0.76190476 0.76190476 0.72727273 0.95454545 0.81818182 0.72727273 0.68181818] mean value: 0.7909090909090909 key: train_recall value: [0.82474227 0.83505155 0.84020619 0.82989691 0.82989691 0.85492228 0.82901554 0.84455959 0.85492228 0.83419689] mean value: 0.837741039474387 key: test_roc_auc value: [0.79112554 0.745671 0.86038961 0.74458874 0.76731602 0.79220779 0.88203463 0.86147186 0.64935065 0.62662338] mean value: 0.7720779220779221 key: train_roc_auc value: [0.81651621 0.83980557 0.82683884 0.83981892 0.81650286 0.83725495 0.82687891 0.83722825 0.85271887 0.8191603 ] mean value: 0.8312723679290636 key: test_jcc value: [0.65384615 0.60714286 0.75 0.59259259 0.61538462 0.64 0.80769231 0.75 0.51612903 0.48387097] mean value: 0.6416658526658526 key: train_jcc value: [0.69264069 0.72321429 0.70869565 0.72197309 0.69396552 0.72368421 0.70484581 0.72123894 0.74324324 0.6969697 ] mean value: 0.7130471145711001 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.0431459 0.01635933 0.01623845 0.01651263 0.01639867 0.01648521 0.01640391 0.01852489 0.01559401 0.01632786] mean value: 0.019199085235595704 key: score_time value: [0.01493192 0.01494384 0.0147562 0.01472116 0.0147326 0.01476002 0.01519728 0.0136559 0.0147171 0.01477838] mean value: 0.014719438552856446 key: test_mcc value: [0.3071961 0.26318068 0.55959928 0.55959928 0.27394005 0.30666041 0.34318385 0.25490741 0.35868355 0.35185603] mean value: 0.35788066273155245 key: train_mcc value: [0.42261743 0.43687627 0.37528635 0.39033754 0.41539874 0.37697052 0.3880617 0.40675265 0.42447646 0.41278926] mean value: 0.4049566920759974 key: test_accuracy value: [0.65116279 0.62790698 0.76744186 0.76744186 0.60465116 0.65116279 0.65116279 0.62790698 0.6744186 0.6744186 ] mean value: 0.6697674418604651 key: train_accuracy value: [0.70542636 0.71059432 0.67700258 0.6873385 0.7002584 0.67958656 0.66925065 0.69509044 0.70542636 0.69767442] mean value: 0.692764857881137 key: test_fscore value: [0.66666667 0.65217391 0.79166667 0.79166667 0.69090909 0.69387755 0.72727273 0.65217391 0.72 0.70833333] mean value: 0.7094740528622516 key: train_fscore value: [0.73732719 0.74545455 0.72406181 0.7268623 0.73636364 0.72072072 0.73333333 0.73181818 0.73732719 0.73469388] mean value: 0.7327962785759218 key: test_precision value: [0.625 0.6 0.7037037 0.7037037 0.55882353 0.62962963 0.60606061 0.625 0.64285714 0.65384615] mean value: 0.6348624469212705 key: train_precision value: [0.66666667 0.66666667 0.63320463 0.64658635 0.65853659 0.6374502 0.61324042 0.65182186 0.66390041 0.65322581] mean value: 0.6491299598344551 key: test_recall value: [0.71428571 0.71428571 0.9047619 0.9047619 0.9047619 0.77272727 0.90909091 0.68181818 0.81818182 0.77272727] mean value: 0.8097402597402598 key: train_recall value: [0.82474227 0.84536082 0.84536082 0.82989691 0.83505155 0.82901554 0.9119171 0.83419689 0.82901554 0.83937824] mean value: 0.8423935687196197 key: test_roc_auc value: [0.6525974 0.62987013 0.77056277 0.77056277 0.61147186 0.6482684 0.64502165 0.62662338 0.67099567 0.67207792] mean value: 0.6698051948051948 key: train_roc_auc value: [0.70511725 0.71024518 0.67656642 0.68696918 0.69990919 0.67997169 0.66987607 0.69544896 0.70574489 0.69803963] mean value: 0.6927888467496394 key: test_jcc value: [0.5 0.48387097 0.65517241 0.65517241 0.52777778 0.53125 0.57142857 0.48387097 0.5625 0.5483871 ] mean value: 0.5519430209050621 key: train_jcc value: [0.58394161 0.5942029 0.56747405 0.57092199 0.58273381 0.56338028 0.57894737 0.57706093 0.58394161 0.58064516] mean value: 0.5783249700738864 MCC on Blind test: 0.32 Accuracy on Blind test: 0.67 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0163424 0.01679897 0.01675701 0.01685286 0.016855 0.01681352 0.01682734 0.01688409 0.01697421 0.01698279] mean value: 0.01680881977081299 key: score_time value: [0.01396537 0.01491928 0.01495099 0.01483655 0.01477695 0.01482654 0.01482534 0.01485395 0.0148468 0.01482868] mean value: 0.01476304531097412 key: test_mcc value: [0.35141081 0.25490741 0.53595916 0.4517935 0.31423621 0.48807056 0.4912706 0.2581351 0.44701207 0.3961039 ] mean value: 0.3988899295731957 key: train_mcc value: [0.50010976 0.51422314 0.45901322 0.48339175 0.48864075 0.49401307 0.46826734 0.50904089 0.47386097 0.47386097] mean value: 0.48644218743566964 key: test_accuracy value: [0.6744186 0.62790698 0.76744186 0.72093023 0.65116279 0.74418605 0.74418605 0.62790698 0.72093023 0.69767442] mean value: 0.6976744186046512 key: train_accuracy value: [0.74935401 0.75710594 0.72868217 0.74160207 0.74418605 0.74677003 0.73385013 0.75452196 0.73643411 0.73643411] mean value: 0.7428940568475452 key: test_fscore value: [0.68181818 0.6 0.75 0.73913043 0.68085106 0.75555556 0.76595745 0.61904762 0.75 0.69767442] mean value: 0.7040034720446915 key: train_fscore value: [0.75930521 0.75897436 0.74074074 0.74619289 0.74936709 0.75126904 0.73924051 0.75324675 0.74371859 0.74371859] mean value: 0.7485773773680334 key: test_precision value: [0.65217391 0.63157895 0.78947368 0.68 0.61538462 0.73913043 0.72 0.65 0.69230769 0.71428571] mean value: 0.6884335001383056 key: train_precision value: [0.73205742 0.75510204 0.71090047 0.735 0.73631841 0.73631841 0.72277228 0.75520833 0.72195122 0.72195122] mean value: 0.7327579796523762 key: test_recall value: [0.71428571 0.57142857 0.71428571 0.80952381 0.76190476 0.77272727 0.81818182 0.59090909 0.81818182 0.68181818] mean value: 0.7253246753246754 key: train_recall value: [0.78865979 0.7628866 0.77319588 0.75773196 0.7628866 0.76683938 0.75647668 0.75129534 0.76683938 0.76683938] mean value: 0.7653650980182682 key: test_roc_auc value: [0.67532468 0.62662338 0.76623377 0.72294372 0.65367965 0.74350649 0.74242424 0.62878788 0.71861472 0.69805195] mean value: 0.6976190476190477 key: train_roc_auc value: [0.74925218 0.75709097 0.72856685 0.74156028 0.7441376 0.74682175 0.73390845 0.75451365 0.73651247 0.73651247] mean value: 0.7428876662571444 key: test_jcc value: [0.51724138 0.42857143 0.6 0.5862069 0.51612903 0.60714286 0.62068966 0.44827586 0.6 0.53571429] mean value: 0.5459971396790084 key: train_jcc value: [0.612 0.61157025 0.58823529 0.5951417 0.59919028 0.60162602 0.58634538 0.60416667 0.592 0.592 ] mean value: 0.5982275590310133 MCC on Blind test: 0.48 Accuracy on Blind test: 0.73 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01586819 0.0141995 0.02298093 0.02030969 0.01366186 0.0336926 0.01359272 0.03401446 0.01413941 0.02369332] mean value: 0.020615267753601074 key: score_time value: [0.03825784 0.03617787 0.05568075 0.05602121 0.04199529 0.04548907 0.04207611 0.04852986 0.06230307 0.03881931] mean value: 0.04653503894805908 key: test_mcc value: [0.34859132 0.3030303 0.44155844 0.35748709 0.16887427 0.2581351 0.25541126 0.16485939 0.11404496 0.35748709] mean value: 0.2769479215358097 key: train_mcc value: [0.52475891 0.4780321 0.49373354 0.48437399 0.52463135 0.48366935 0.52454463 0.51449492 0.55103485 0.49958596] mean value: 0.5078859601431154 key: test_accuracy value: [0.6744186 0.65116279 0.72093023 0.6744186 0.58139535 0.62790698 0.62790698 0.58139535 0.55813953 0.6744186 ] mean value: 0.6372093023255814 key: train_accuracy value: [0.7622739 0.73901809 0.74677003 0.74160207 0.7622739 0.74160207 0.7622739 0.75710594 0.7751938 0.74935401] mean value: 0.7537467700258398 key: test_fscore value: [0.65 0.65116279 0.71428571 0.69565217 0.60869565 0.61904762 0.63636364 0.57142857 0.59574468 0.65 ] mean value: 0.6392380838761236 key: train_fscore value: [0.76649746 0.7403599 0.75126904 0.75124378 0.76530612 0.74619289 0.76165803 0.76020408 0.77974684 0.75566751] mean value: 0.757814564603969 key: test_precision value: [0.68421053 0.63636364 0.71428571 0.64 0.56 0.65 0.63636364 0.6 0.56 0.72222222] mean value: 0.6403445735550999 key: train_precision value: [0.755 0.73846154 0.74 0.72596154 0.75757576 0.73134328 0.76165803 0.74874372 0.76237624 0.73529412] mean value: 0.7456414223032793 key: test_recall value: [0.61904762 0.66666667 0.71428571 0.76190476 0.66666667 0.59090909 0.63636364 0.54545455 0.63636364 0.59090909] mean value: 0.6428571428571428 key: train_recall value: [0.77835052 0.74226804 0.7628866 0.77835052 0.77319588 0.76165803 0.76165803 0.77202073 0.79792746 0.77720207] mean value: 0.7705517867635276 key: test_roc_auc value: [0.67316017 0.65151515 0.72077922 0.67640693 0.58333333 0.62878788 0.62770563 0.58225108 0.55627706 0.67640693] mean value: 0.6376623376623376 key: train_roc_auc value: [0.76223225 0.73900967 0.74672827 0.74150686 0.76224561 0.74165376 0.76227231 0.75714438 0.77525239 0.74942578] mean value: 0.7537471288926874 key: test_jcc value: [0.48148148 0.48275862 0.55555556 0.53333333 0.4375 0.44827586 0.46666667 0.4 0.42424242 0.48148148] mean value: 0.4711295425519563 key: train_jcc value: [0.62139918 0.5877551 0.60162602 0.60159363 0.61983471 0.5951417 0.61506276 0.61316872 0.63900415 0.60728745] mean value: 0.6101873416458796 MCC on Blind test: 0.15 Accuracy on Blind test: 0.58 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02942824 0.03070235 0.03118563 0.03116775 0.03088856 0.03088307 0.03110266 0.03078842 0.03027916 0.03081298] mean value: 0.030723881721496583 key: score_time value: [0.01875687 0.01874995 0.0188489 0.01890826 0.01899743 0.01892638 0.0193913 0.01888132 0.01872396 0.01879382] mean value: 0.018897819519042968 key: test_mcc value: [0.58225108 0.58824786 0.72077922 0.58824786 0.4517935 0.58824786 0.58557701 0.55959928 0.41330345 0.34859132] mean value: 0.5426638435616064 key: train_mcc value: [0.72193009 0.70123323 0.69518417 0.71095739 0.69047778 0.69557211 0.72617268 0.7109111 0.71656987 0.71625569] mean value: 0.7085264102208928 key: test_accuracy value: [0.79069767 0.79069767 0.86046512 0.79069767 0.72093023 0.79069767 0.79069767 0.76744186 0.69767442 0.6744186 ] mean value: 0.7674418604651163 key: train_accuracy value: [0.86046512 0.8501292 0.84754522 0.85529716 0.84496124 0.84754522 0.8630491 0.85529716 0.85788114 0.85788114] mean value: 0.8540051679586563 key: test_fscore value: [0.79069767 0.8 0.85714286 0.8 0.73913043 0.7804878 0.80851064 0.73684211 0.74509804 0.69565217] mean value: 0.775356172791188 key: train_fscore value: [0.85714286 0.84656085 0.84675325 0.85340314 0.84848485 0.84987277 0.8616188 0.85263158 0.86075949 0.86005089] mean value: 0.853727847599906 key: test_precision value: [0.77272727 0.75 0.85714286 0.75 0.68 0.84210526 0.76 0.875 0.65517241 0.66666667] mean value: 0.7608814473487795 key: train_precision value: [0.88043478 0.86956522 0.85340314 0.86702128 0.83168317 0.835 0.86842105 0.86631016 0.84158416 0.845 ] mean value: 0.8558422957749061 key: test_recall value: [0.80952381 0.85714286 0.85714286 0.85714286 0.80952381 0.72727273 0.86363636 0.63636364 0.86363636 0.72727273] mean value: 0.8008658008658008 key: train_recall value: [0.83505155 0.82474227 0.84020619 0.84020619 0.86597938 0.86528497 0.85492228 0.83937824 0.88082902 0.87564767] mean value: 0.8522247743176112 key: test_roc_auc value: [0.79112554 0.79220779 0.86038961 0.79220779 0.72294372 0.79220779 0.78896104 0.77056277 0.69372294 0.67316017] mean value: 0.7677489177489177 key: train_roc_auc value: [0.86053095 0.85019497 0.84756423 0.85533625 0.84490679 0.84759094 0.86302815 0.85525613 0.85794028 0.85792693] mean value: 0.8540275626302014 key: test_jcc value: [0.65384615 0.66666667 0.75 0.66666667 0.5862069 0.64 0.67857143 0.58333333 0.59375 0.53333333] mean value: 0.6352374478969307 key: train_jcc value: [0.75 0.73394495 0.73423423 0.74429224 0.73684211 0.73893805 0.75688073 0.74311927 0.75555556 0.75446429] mean value: 0.7448271425435942 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [3.31610084 3.66471815 3.69096637 2.90685415 3.13831234 3.5732975 3.3353126 3.1690793 3.41401672 2.58294654] mean value: 3.2791604518890383 key: score_time value: [0.02754307 0.02450562 0.03252292 0.0326395 0.03124905 0.02395105 0.0236342 0.03183818 0.03920364 0.01291871] mean value: 0.028000593185424805 key: test_mcc value: [0.62770563 0.58824786 0.4912706 0.58134627 0.44227524 0.79001638 0.72077922 0.68193178 0.44227524 0.40088002] mean value: 0.5766728235571799 key: train_mcc value: [0.9741727 0.96899204 0.9638374 0.97417339 0.96899204 0.9638374 0.96904463 0.97417339 0.98450896 0.98461498] mean value: 0.9726346929348463 key: test_accuracy value: [0.81395349 0.79069767 0.74418605 0.79069767 0.72093023 0.88372093 0.86046512 0.8372093 0.72093023 0.69767442] mean value: 0.786046511627907 key: train_accuracy value: [0.9870801 0.98449612 0.98191214 0.9870801 0.98449612 0.98191214 0.98449612 0.9870801 0.99224806 0.99224806] mean value: 0.9863049095607235 key: test_fscore value: [0.80952381 0.8 0.71794872 0.7804878 0.7 0.87179487 0.86363636 0.82926829 0.73913043 0.68292683] mean value: 0.779471712451564 key: train_fscore value: [0.98714653 0.98453608 0.98191214 0.9870801 0.98453608 0.98191214 0.98453608 0.9870801 0.99220779 0.99228792] mean value: 0.9863234983055275 key: test_precision value: [0.80952381 0.75 0.77777778 0.8 0.73684211 1. 0.86363636 0.89473684 0.70833333 0.73684211] mean value: 0.8077692336902863 key: train_precision value: [0.98461538 0.98453608 0.98445596 0.98963731 0.98453608 0.97938144 0.97948718 0.98453608 0.99479167 0.98469388] mean value: 0.9850671063290606 key: test_recall value: [0.80952381 0.85714286 0.66666667 0.76190476 0.66666667 0.77272727 0.86363636 0.77272727 0.77272727 0.63636364] mean value: 0.758008658008658 key: train_recall value: [0.98969072 0.98453608 0.97938144 0.98453608 0.98453608 0.98445596 0.98963731 0.98963731 0.98963731 1. ] mean value: 0.9876048288018803 key: test_roc_auc value: [0.81385281 0.79220779 0.74242424 0.79004329 0.71969697 0.88636364 0.86038961 0.83874459 0.71969697 0.6991342 ] mean value: 0.7862554112554112 key: train_roc_auc value: [0.98707334 0.98449602 0.9819187 0.98708669 0.98449602 0.9819187 0.98450937 0.98708669 0.99224133 0.99226804] mean value: 0.9863094920143155 key: test_jcc value: [0.68 0.66666667 0.56 0.64 0.53846154 0.77272727 0.76 0.70833333 0.5862069 0.51851852] mean value: 0.6430914226259054 key: train_jcc value: [0.97461929 0.96954315 0.96446701 0.9744898 0.96954315 0.96446701 0.96954315 0.9744898 0.98453608 0.98469388] mean value: 0.9730392292978733 MCC on Blind test: 0.34 Accuracy on Blind test: 0.67 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.03067994 0.02926111 0.02483511 0.0242579 0.03077769 0.02583909 0.02731919 0.02467728 0.02925134 0.02427554] mean value: 0.027117419242858886 key: score_time value: [0.01235723 0.01009178 0.00929546 0.00893426 0.0089798 0.00907421 0.00918913 0.00918341 0.00910568 0.00909543] mean value: 0.0095306396484375 key: test_mcc value: [0.53463203 0.53463203 0.72451364 0.53595916 0.53796222 0.64040632 0.34848485 0.44468651 0.40088002 0.26318068] mean value: 0.4965337462592751 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.76744186 0.76744186 0.86046512 0.76744186 0.76744186 0.81395349 0.6744186 0.72093023 0.69767442 0.62790698] mean value: 0.7465116279069768 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.76190476 0.76190476 0.86363636 0.75 0.77272727 0.8 0.68181818 0.71428571 0.68292683 0.6 ] mean value: 0.7389203885545349 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.76190476 0.76190476 0.82608696 0.78947368 0.73913043 0.88888889 0.68181818 0.75 0.73684211 0.66666667] mean value: 0.7602716441961292 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.76190476 0.76190476 0.9047619 0.71428571 0.80952381 0.72727273 0.68181818 0.68181818 0.63636364 0.54545455] mean value: 0.7225108225108225 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.76731602 0.76731602 0.86147186 0.76623377 0.76839827 0.81601732 0.67424242 0.72186147 0.6991342 0.62987013] mean value: 0.7471861471861472 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.61538462 0.61538462 0.76 0.6 0.62962963 0.66666667 0.51724138 0.55555556 0.51851852 0.42857143] mean value: 0.5906952409021374 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.47 Accuracy on Blind test: 0.73 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.11986899 0.12324095 0.12095547 0.11996508 0.12497091 0.11812854 0.11927009 0.11958623 0.12523746 0.1254375 ] mean value: 0.12166612148284912 key: score_time value: [0.01776338 0.0179534 0.01831174 0.0179131 0.01776171 0.01782417 0.01797843 0.01762795 0.01887631 0.01881528] mean value: 0.018082547187805175 key: test_mcc value: [0.53595916 0.58557701 0.72077922 0.63123793 0.48917749 0.58557701 0.48807056 0.58225108 0.44701207 0.4633482 ] mean value: 0.5528989731062869 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.76744186 0.79069767 0.86046512 0.81395349 0.74418605 0.79069767 0.74418605 0.79069767 0.72093023 0.72093023] mean value: 0.7744186046511627 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.75 0.76923077 0.85714286 0.81818182 0.74418605 0.80851064 0.75555556 0.79069767 0.75 0.68421053] mean value: 0.7727715885654894 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.78947368 0.83333333 0.85714286 0.7826087 0.72727273 0.76 0.73913043 0.80952381 0.69230769 0.8125 ] mean value: 0.7803293234225729 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.71428571 0.71428571 0.85714286 0.85714286 0.76190476 0.86363636 0.77272727 0.77272727 0.81818182 0.59090909] mean value: 0.7722943722943723 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.76623377 0.78896104 0.86038961 0.81493506 0.74458874 0.78896104 0.74350649 0.79112554 0.71861472 0.72402597] mean value: 0.7741341991341991 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.6 0.625 0.75 0.69230769 0.59259259 0.67857143 0.60714286 0.65384615 0.6 0.52 ] mean value: 0.6319460724460725 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.0113728 0.01004457 0.01123762 0.01008534 0.01026392 0.01027369 0.01014042 0.01014757 0.01015282 0.01003885] mean value: 0.010375761985778808 key: score_time value: [0.00955272 0.00899649 0.00898719 0.00902295 0.00880337 0.00935388 0.00888276 0.00890851 0.00882626 0.0089283 ] mean value: 0.009026241302490235 key: test_mcc value: [0.07158368 0.20824344 0.45629995 0.48917749 0.39696419 0.16887427 0.58225108 0.58134627 0.58134627 0.21908017] mean value: 0.3755166816788424 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.53488372 0.60465116 0.72093023 0.74418605 0.69767442 0.58139535 0.79069767 0.79069767 0.79069767 0.60465116] mean value: 0.686046511627907 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.54545455 0.58536585 0.66666667 0.74418605 0.66666667 0.55 0.79069767 0.8 0.8 0.56410256] mean value: 0.6713140017479212 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.52173913 0.6 0.8 0.72727273 0.72222222 0.61111111 0.80952381 0.7826087 0.7826087 0.64705882] mean value: 0.7004145215398413 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.57142857 0.57142857 0.57142857 0.76190476 0.61904762 0.5 0.77272727 0.81818182 0.81818182 0.5 ] mean value: 0.6504329004329005 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.53571429 0.6038961 0.71753247 0.74458874 0.69588745 0.58333333 0.79112554 0.79004329 0.79004329 0.60714286] mean value: 0.685930735930736 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.375 0.4137931 0.5 0.59259259 0.5 0.37931034 0.65384615 0.66666667 0.66666667 0.39285714] mean value: 0.5140732670905085 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.57 Accuracy on Blind test: 0.79 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.72484875 1.69839096 1.69777799 1.69739676 1.70870376 1.71582556 1.71650362 1.70737052 1.68216228 1.69348478] mean value: 1.704246497154236 key: score_time value: [0.09137702 0.09932709 0.09243751 0.09427667 0.093925 0.09772754 0.0989058 0.09075212 0.09389806 0.09104729] mean value: 0.0943674087524414 key: test_mcc value: [0.62964308 0.48917749 0.86147186 0.81778934 0.67462198 0.76789769 0.68193178 0.58824786 0.44701207 0.51986413] mean value: 0.647765726359847 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.81395349 0.74418605 0.93023256 0.90697674 0.8372093 0.88372093 0.8372093 0.79069767 0.72093023 0.74418605] mean value: 0.8209302325581396 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8 0.74418605 0.93023256 0.90909091 0.82926829 0.88888889 0.82926829 0.7804878 0.75 0.7027027 ] mean value: 0.8164125495577567 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.84210526 0.72727273 0.90909091 0.86956522 0.85 0.86956522 0.89473684 0.84210526 0.69230769 0.86666667] mean value: 0.8363415798541657 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.76190476 0.76190476 0.95238095 0.95238095 0.80952381 0.90909091 0.77272727 0.72727273 0.81818182 0.59090909] mean value: 0.8056277056277056 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.81277056 0.74458874 0.93073593 0.90800866 0.83658009 0.88311688 0.83874459 0.79220779 0.71861472 0.7478355 ] mean value: 0.8213203463203463 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.66666667 0.59259259 0.86956522 0.83333333 0.70833333 0.8 0.70833333 0.64 0.6 0.54166667] mean value: 0.696049114331723 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.57 Accuracy on Blind test: 0.79 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.95613575 1.00466895 0.9780705 0.99141526 1.00166965 1.02008724 0.98691678 1.09668422 1.02446008 1.00150752] mean value: 1.006161594390869 key: score_time value: [0.22908759 0.2125628 0.1826551 0.24825335 0.14801216 0.20393419 0.1265645 0.21994519 0.20560575 0.27162194] mean value: 0.20482425689697265 key: test_mcc value: [0.67462198 0.48917749 0.81778934 0.72077922 0.76789769 0.72077922 0.67532468 0.50454827 0.48807056 0.51986413] mean value: 0.6378852573856032 key: train_mcc value: [0.89158365 0.8914826 0.88630415 0.8914826 0.87081007 0.88635453 0.88635453 0.87596816 0.88143837 0.89683728] mean value: 0.8858615950471861 key: test_accuracy value: [0.8372093 0.74418605 0.90697674 0.86046512 0.88372093 0.86046512 0.8372093 0.74418605 0.74418605 0.74418605] mean value: 0.8162790697674418 key: train_accuracy value: [0.94573643 0.94573643 0.94315245 0.94573643 0.93540052 0.94315245 0.94315245 0.9379845 0.94056848 0.94832041] mean value: 0.9428940568475452 key: test_fscore value: [0.82926829 0.74418605 0.90909091 0.85714286 0.87804878 0.86363636 0.8372093 0.71794872 0.75555556 0.7027027 ] mean value: 0.8094789528085047 key: train_fscore value: [0.94545455 0.94601542 0.94329897 0.94601542 0.93573265 0.94329897 0.94329897 0.93782383 0.94117647 0.94871795] mean value: 0.9430833202318074 key: test_precision value: [0.85 0.72727273 0.86956522 0.85714286 0.9 0.86363636 0.85714286 0.82352941 0.73913043 0.86666667] mean value: 0.835408653580009 key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( train_precision value: [0.95287958 0.94358974 0.94329897 0.94358974 0.93333333 0.93846154 0.93846154 0.93782383 0.92929293 0.93908629] mean value: 0.9399817505565959 key: test_recall value: [0.80952381 0.76190476 0.95238095 0.85714286 0.85714286 0.86363636 0.81818182 0.63636364 0.77272727 0.59090909] mean value: 0.791991341991342 key: train_recall value: [0.93814433 0.94845361 0.94329897 0.94845361 0.93814433 0.94818653 0.94818653 0.93782383 0.95336788 0.95854922] mean value: 0.9462608834998131 key: test_roc_auc value: [0.83658009 0.74458874 0.90800866 0.86038961 0.88311688 0.86038961 0.83766234 0.74675325 0.74350649 0.7478355 ] mean value: 0.8168831168831169 key: train_roc_auc value: [0.9457561 0.94572939 0.94315208 0.94572939 0.93539341 0.94316543 0.94316543 0.93798408 0.94060146 0.94834678] mean value: 0.9429023556433951 key: test_jcc value: [0.70833333 0.59259259 0.83333333 0.75 0.7826087 0.76 0.72 0.56 0.60714286 0.54166667] mean value: 0.6855677478720957 key: train_jcc value: [0.89655172 0.89756098 0.89268293 0.89756098 0.87922705 0.89268293 0.89268293 0.88292683 0.88888889 0.90243902] mean value: 0.892320425153277 MCC on Blind test: 0.51 Accuracy on Blind test: 0.76 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01284814 0.01265669 0.012532 0.01183605 0.01195717 0.01215744 0.01211309 0.01218128 0.01174545 0.01174068] mean value: 0.012176799774169921 key: score_time value: [0.01120496 0.01093507 0.01121926 0.01031542 0.01077127 0.01087213 0.01022887 0.01016021 0.01030135 0.01021433] mean value: 0.010622286796569824 key: test_mcc value: [0.35141081 0.25490741 0.53595916 0.4517935 0.31423621 0.48807056 0.4912706 0.2581351 0.44701207 0.3961039 ] mean value: 0.3988899295731957 key: train_mcc value: [0.50010976 0.51422314 0.45901322 0.48339175 0.48864075 0.49401307 0.46826734 0.50904089 0.47386097 0.47386097] mean value: 0.48644218743566964 key: test_accuracy value: [0.6744186 0.62790698 0.76744186 0.72093023 0.65116279 0.74418605 0.74418605 0.62790698 0.72093023 0.69767442] mean value: 0.6976744186046512 key: train_accuracy value: [0.74935401 0.75710594 0.72868217 0.74160207 0.74418605 0.74677003 0.73385013 0.75452196 0.73643411 0.73643411] mean value: 0.7428940568475452 key: test_fscore value: [0.68181818 0.6 0.75 0.73913043 0.68085106 0.75555556 0.76595745 0.61904762 0.75 0.69767442] mean value: 0.7040034720446915 key: train_fscore value: [0.75930521 0.75897436 0.74074074 0.74619289 0.74936709 0.75126904 0.73924051 0.75324675 0.74371859 0.74371859] mean value: 0.7485773773680334 key: test_precision value: [0.65217391 0.63157895 0.78947368 0.68 0.61538462 0.73913043 0.72 0.65 0.69230769 0.71428571] mean value: 0.6884335001383056 key: train_precision value: [0.73205742 0.75510204 0.71090047 0.735 0.73631841 0.73631841 0.72277228 0.75520833 0.72195122 0.72195122] mean value: 0.7327579796523762 key: test_recall value: [0.71428571 0.57142857 0.71428571 0.80952381 0.76190476 0.77272727 0.81818182 0.59090909 0.81818182 0.68181818] mean value: 0.7253246753246754 key: train_recall value: [0.78865979 0.7628866 0.77319588 0.75773196 0.7628866 0.76683938 0.75647668 0.75129534 0.76683938 0.76683938] mean value: 0.7653650980182682 key: test_roc_auc value: [0.67532468 0.62662338 0.76623377 0.72294372 0.65367965 0.74350649 0.74242424 0.62878788 0.71861472 0.69805195] mean value: 0.6976190476190477 key: train_roc_auc value: [0.74925218 0.75709097 0.72856685 0.74156028 0.7441376 0.74682175 0.73390845 0.75451365 0.73651247 0.73651247] mean value: 0.7428876662571444 key: test_jcc value: [0.51724138 0.42857143 0.6 0.5862069 0.51612903 0.60714286 0.62068966 0.44827586 0.6 0.53571429] mean value: 0.5459971396790084 key: train_jcc value: [0.612 0.61157025 0.58823529 0.5951417 0.59919028 0.60162602 0.58634538 0.60416667 0.592 0.592 ] mean value: 0.5982275590310133 MCC on Blind test: 0.48 Accuracy on Blind test: 0.73 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [1.2328546 0.66477251 1.61751771 0.90823627 0.69649839 0.87800646 0.88173628 0.4864018 0.63253689 1.07032633] mean value: 0.9068887233734131 key: score_time value: [0.01455235 0.01226497 0.01529741 0.01239967 0.01296425 0.01283431 0.01390839 0.01246428 0.01260304 0.01239204] mean value: 0.013168072700500489 key: test_mcc value: [0.72077922 0.53796222 0.81778934 0.723327 0.72451364 0.81778934 0.76839827 0.67532468 0.72077922 0.73471273] mean value: 0.7241375648717405 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86046512 0.76744186 0.90697674 0.86046512 0.86046512 0.90697674 0.88372093 0.8372093 0.86046512 0.86046512] mean value: 0.8604651162790697 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 0.77272727 0.90909091 0.85 0.86363636 0.9047619 0.88372093 0.8372093 0.86363636 0.85 ] mean value: 0.8591925903553811 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.85714286 0.73913043 0.86956522 0.89473684 0.82608696 0.95 0.9047619 0.85714286 0.86363636 0.94444444] mean value: 0.8706647877929342 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.85714286 0.80952381 0.95238095 0.80952381 0.9047619 0.86363636 0.86363636 0.81818182 0.86363636 0.77272727] mean value: 0.8515151515151516 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86038961 0.76839827 0.90800866 0.85930736 0.86147186 0.90800866 0.88419913 0.83766234 0.86038961 0.86255411] mean value: 0.861038961038961 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 0.62962963 0.83333333 0.73913043 0.76 0.82608696 0.79166667 0.72 0.76 0.73913043] mean value: 0.7548977455716586 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.64 Accuracy on Blind test: 0.82 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.11340785 0.06507301 0.08535337 0.06547856 0.06026602 0.16849113 0.12170339 0.08560491 0.13529682 0.08245301] mean value: 0.09831280708312988 key: score_time value: [0.03657317 0.02178717 0.01445198 0.0345099 0.01233554 0.01242328 0.02445245 0.01907539 0.01216125 0.02539206] mean value: 0.021316218376159667 key: test_mcc value: [0.53796222 0.35141081 0.39479486 0.53463203 0.58225108 0.58824786 0.58225108 0.54609991 0.30265778 0.31423621] mean value: 0.4734543830252807 key: train_mcc value: [0.78836141 0.85069894 0.79855231 0.78838964 0.81913359 0.79393717 0.79853618 0.7934388 0.80366443 0.80365395] mean value: 0.8038366418937956 key: test_accuracy value: [0.76744186 0.6744186 0.69767442 0.76744186 0.79069767 0.79069767 0.79069767 0.76744186 0.65116279 0.65116279] mean value: 0.7348837209302326 key: train_accuracy value: [0.89405685 0.9250646 0.89922481 0.89405685 0.90956072 0.89664083 0.89922481 0.89664083 0.90180879 0.90180879] mean value: 0.9018087855297158 key: test_fscore value: [0.77272727 0.68181818 0.68292683 0.76190476 0.79069767 0.7804878 0.79069767 0.75 0.68085106 0.61538462] mean value: 0.7307495878648169 key: train_fscore value: [0.8956743 0.92388451 0.8987013 0.89295039 0.90956072 0.89417989 0.89817232 0.89528796 0.90206186 0.90104167] mean value: 0.9011514926942206 key: test_precision value: [0.73913043 0.65217391 0.7 0.76190476 0.77272727 0.84210526 0.80952381 0.83333333 0.64 0.70588235] mean value: 0.7456781141414336 key: train_precision value: [0.88442211 0.94117647 0.90575916 0.9047619 0.9119171 0.91351351 0.90526316 0.9047619 0.8974359 0.90575916] mean value: 0.9074770382561882 key: test_recall value: [0.80952381 0.71428571 0.66666667 0.76190476 0.80952381 0.72727273 0.77272727 0.68181818 0.72727273 0.54545455] mean value: 0.7216450216450216 key: train_recall value: [0.90721649 0.90721649 0.89175258 0.8814433 0.90721649 0.87564767 0.89119171 0.88601036 0.90673575 0.89637306] mean value: 0.895080391004754 key: test_roc_auc value: [0.76839827 0.67532468 0.6969697 0.76731602 0.79112554 0.79220779 0.79112554 0.76948052 0.64935065 0.65367965] mean value: 0.7354978354978355 key: train_roc_auc value: [0.89402276 0.92511084 0.89924416 0.89408953 0.9095668 0.89658672 0.8992041 0.89661343 0.90182148 0.90179478] mean value: 0.9018054591100902 key: test_jcc value: [0.62962963 0.51724138 0.51851852 0.61538462 0.65384615 0.64 0.65384615 0.6 0.51612903 0.44444444] mean value: 0.5789039927237924 key: train_jcc value: [0.81105991 0.85853659 0.81603774 0.80660377 0.83412322 0.80861244 0.81516588 0.81042654 0.82159624 0.81990521] mean value: 0.8202067540037329 MCC on Blind test: 0.4 Accuracy on Blind test: 0.7 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01193261 0.01175404 0.01164556 0.011657 0.0116663 0.011657 0.01157117 0.0117991 0.01158476 0.01160455] mean value: 0.011687207221984863 key: score_time value: [0.02242517 0.01041889 0.01026773 0.01022983 0.01037145 0.01021743 0.01020646 0.01034784 0.01019359 0.01028991] mean value: 0.011496829986572265 key: test_mcc value: [0.3030303 0.34848485 0.64040632 0.49456394 0.40939224 0.39479486 0.49916256 0.25541126 0.4912706 0.30265778] mean value: 0.4139174695448401 key: train_mcc value: [0.43912593 0.43417069 0.41788904 0.42787777 0.42376414 0.43014703 0.40825746 0.45098666 0.43399577 0.46079208] mean value: 0.4327006564068274 key: test_accuracy value: [0.65116279 0.6744186 0.81395349 0.74418605 0.69767442 0.69767442 0.74418605 0.62790698 0.74418605 0.65116279] mean value: 0.7046511627906977 key: train_accuracy value: [0.71834625 0.71576227 0.70801034 0.71317829 0.71059432 0.71317829 0.70284238 0.72351421 0.71576227 0.72868217] mean value: 0.7149870801033592 key: test_fscore value: [0.65116279 0.66666667 0.82608696 0.75555556 0.72340426 0.71111111 0.7755102 0.63636364 0.76595745 0.68085106] mean value: 0.7192669686955463 key: train_fscore value: [0.73349633 0.73170732 0.72235872 0.72592593 0.72682927 0.72992701 0.71744472 0.73965937 0.72906404 0.74327628] mean value: 0.7299688981336869 key: test_precision value: [0.63636364 0.66666667 0.76 0.70833333 0.65384615 0.69565217 0.7037037 0.63636364 0.72 0.64 ] mean value: 0.6820929304190174 key: train_precision value: [0.69767442 0.69444444 0.69014085 0.69668246 0.68981481 0.68807339 0.68224299 0.69724771 0.69483568 0.7037037 ] mean value: 0.6934860463415824 key: test_recall value: [0.66666667 0.66666667 0.9047619 0.80952381 0.80952381 0.72727273 0.86363636 0.63636364 0.81818182 0.72727273] mean value: 0.762987012987013 key: train_recall value: [0.77319588 0.77319588 0.75773196 0.75773196 0.76804124 0.77720207 0.75647668 0.78756477 0.76683938 0.78756477] mean value: 0.7705544575610277 key: test_roc_auc value: [0.65151515 0.67424242 0.81601732 0.745671 0.70021645 0.6969697 0.74134199 0.62770563 0.74242424 0.64935065] mean value: 0.7045454545454546 key: train_roc_auc value: [0.71820416 0.71561348 0.70788152 0.71306287 0.71044549 0.7133433 0.70298061 0.72367929 0.71589392 0.72883393] mean value: 0.7149938571657497 key: test_jcc value: [0.48275862 0.5 0.7037037 0.60714286 0.56666667 0.55172414 0.63333333 0.46666667 0.62068966 0.51612903] mean value: 0.5648814673564395 key: train_jcc value: [0.57915058 0.57692308 0.56538462 0.56976744 0.57088123 0.57471264 0.55938697 0.58687259 0.57364341 0.59143969] mean value: 0.5748162242671867 MCC on Blind test: 0.32 Accuracy on Blind test: 0.67 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01609015 0.01382875 0.01554108 0.01616502 0.02625966 0.01413321 0.01446033 0.01906276 0.01966596 0.0247612 ] mean value: 0.017996811866760255 key: score_time value: [0.01026535 0.00975919 0.01020408 0.01017809 0.01034117 0.01031709 0.01144147 0.01193285 0.01191831 0.01199841] mean value: 0.010835599899291993 key: test_mcc value: [0.61187382 0.40939224 0.51258863 0.39479486 0.21578506 0.38684081 0.59541363 0.41223987 0.26856633 0.31423621] mean value: 0.4121731451940424 key: train_mcc value: [0.64394423 0.69530024 0.60234402 0.64916894 0.46289376 0.54412654 0.61858746 0.48824211 0.6822421 0.68799886] mean value: 0.6074848250415069 key: test_accuracy value: [0.79069767 0.69767442 0.74418605 0.69767442 0.53488372 0.6744186 0.79069767 0.6744186 0.62790698 0.65116279] mean value: 0.6883720930232557 key: train_accuracy value: [0.81136951 0.84754522 0.77260982 0.82428941 0.67700258 0.74418605 0.80620155 0.69509044 0.8372093 0.84237726] mean value: 0.7857881136950905 key: test_fscore value: [0.74285714 0.72340426 0.68571429 0.68292683 0.67741935 0.74074074 0.81632653 0.75 0.69230769 0.61538462] mean value: 0.7127081447042873 key: train_fscore value: [0.78466077 0.84987277 0.7124183 0.82105263 0.75633528 0.78980892 0.81840194 0.76494024 0.84819277 0.83378747] mean value: 0.7979471085693836 key: test_precision value: [0.92857143 0.65384615 0.85714286 0.7 0.51219512 0.625 0.74074074 0.61764706 0.6 0.70588235] mean value: 0.6941025714017106 key: train_precision value: [0.91724138 0.83919598 0.97321429 0.83870968 0.60815047 0.66906475 0.76818182 0.62135922 0.79279279 0.87931034] mean value: 0.7907220719867525 key: test_recall value: [0.61904762 0.80952381 0.57142857 0.66666667 1. 0.90909091 0.90909091 0.95454545 0.81818182 0.54545455] mean value: 0.7803030303030303 key: train_recall value: [0.68556701 0.86082474 0.56185567 0.80412371 1. 0.96373057 0.87564767 0.99481865 0.9119171 0.79274611] mean value: 0.8451231237647562 key: test_roc_auc value: [0.78679654 0.70021645 0.74025974 0.6969697 0.54545455 0.66883117 0.78787879 0.66774892 0.62337662 0.65367965] mean value: 0.6871212121212121 key: train_roc_auc value: [0.81169542 0.84751082 0.77315581 0.82434165 0.6761658 0.74475188 0.80638054 0.69586293 0.83740185 0.84224935] mean value: 0.7859516051492976 key: test_jcc value: [0.59090909 0.56666667 0.52173913 0.51851852 0.51219512 0.58823529 0.68965517 0.6 0.52941176 0.44444444] mean value: 0.5561775204162045 key: train_jcc value: [0.64563107 0.73893805 0.55329949 0.69642857 0.60815047 0.65263158 0.69262295 0.61935484 0.73640167 0.71495327] mean value: 0.6658411968237227 MCC on Blind test: 0.52 Accuracy on Blind test: 0.76 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.03140759 0.0151937 0.01848125 0.01726508 0.01619983 0.02581859 0.0273428 0.02270555 0.01659775 0.02209139] mean value: 0.02131035327911377 key: score_time value: [0.01060104 0.01054025 0.01031375 0.01028848 0.01018238 0.01028967 0.01034975 0.01032352 0.01027846 0.01031113] mean value: 0.010347843170166016 key: test_mcc value: [0.59970431 0.4633482 0.75210143 0.57282196 0.4517935 0.61187382 0.67462198 0.36986766 0.20835137 0.40041988] mean value: 0.5104904109207444 key: train_mcc value: [0.65519777 0.73263194 0.72908637 0.5719457 0.69442892 0.63950439 0.69510176 0.57728429 0.74307158 0.73224955] mean value: 0.6770502267008328 key: test_accuracy value: [0.79069767 0.72093023 0.86046512 0.74418605 0.72093023 0.79069767 0.8372093 0.6744186 0.60465116 0.6744186 ] mean value: 0.741860465116279 key: train_accuracy value: [0.80620155 0.86563307 0.85788114 0.74935401 0.83979328 0.80103359 0.84754522 0.75452196 0.87080103 0.85788114] mean value: 0.8250645994832041 key: test_fscore value: [0.80851064 0.75 0.875 0.79245283 0.73913043 0.82352941 0.84444444 0.73076923 0.63829787 0.58823529] mean value: 0.7590370156705615 key: train_fscore value: [0.83588621 0.87 0.87058824 0.79917184 0.85514019 0.82926829 0.84754522 0.80083857 0.87437186 0.84057971] mean value: 0.8423390135488182 key: test_precision value: [0.73076923 0.66666667 0.77777778 0.65625 0.68 0.72413793 0.82608696 0.63333333 0.6 0.83333333] mean value: 0.7128355229436564 key: train_precision value: [0.72623574 0.84466019 0.8008658 0.66782007 0.78205128 0.7248062 0.84536082 0.67253521 0.84878049 0.95394737] mean value: 0.7867063181527051 key: test_recall value: [0.9047619 0.85714286 1. 1. 0.80952381 0.95454545 0.86363636 0.86363636 0.68181818 0.45454545] mean value: 0.8389610389610389 key: train_recall value: [0.98453608 0.89690722 0.95360825 0.99484536 0.94329897 0.96891192 0.84974093 0.98963731 0.9015544 0.75129534] mean value: 0.9234335772661717 key: test_roc_auc value: [0.79329004 0.72402597 0.86363636 0.75 0.72294372 0.78679654 0.83658009 0.66991342 0.60281385 0.67965368] mean value: 0.7429653679653679 key: train_roc_auc value: [0.80573954 0.86555205 0.85763314 0.74871802 0.83952513 0.80146627 0.84755088 0.75512793 0.87088029 0.85760643] mean value: 0.8249799690187489 key: test_jcc value: [0.67857143 0.6 0.77777778 0.65625 0.5862069 0.7 0.73076923 0.57575758 0.46875 0.41666667] mean value: 0.6190749576094403 key: train_jcc value: [0.71804511 0.7699115 0.77083333 0.66551724 0.74693878 0.70833333 0.73542601 0.66783217 0.77678571 0.725 ] mean value: 0.7284623191849406 MCC on Blind test: 0.53 Accuracy on Blind test: 0.76 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.18568873 0.18229556 0.18312979 0.18038273 0.17975068 0.47091389 0.39590287 0.30203557 0.16115618 0.18954515] mean value: 0.24308011531829835 key: score_time value: [0.0176034 0.01771164 0.0178442 0.01760936 0.01758552 0.04107261 0.0410192 0.01772761 0.01576948 0.02026224] mean value: 0.022420525550842285 key: test_mcc value: [0.62964308 0.64040632 0.67462198 0.44468651 0.53595916 0.72451364 0.67532468 0.58225108 0.62770563 0.54609991] mean value: 0.6081211972694571 key: train_mcc value: [0.9225879 0.93803584 0.94877223 0.94836935 0.91735891 0.92785021 0.92249346 0.93282944 0.93313211 0.94316543] mean value: 0.933459487609339 key: test_accuracy value: [0.81395349 0.81395349 0.8372093 0.72093023 0.76744186 0.86046512 0.8372093 0.79069767 0.81395349 0.76744186] mean value: 0.8023255813953488 key: train_accuracy value: [0.96124031 0.96899225 0.97416021 0.97416021 0.95865633 0.96382429 0.96124031 0.96640827 0.96640827 0.97157623] mean value: 0.9666666666666667 key: test_fscore value: [0.8 0.82608696 0.82926829 0.72727273 0.75 0.85714286 0.8372093 0.79069767 0.81818182 0.75 ] mean value: 0.7985859628546255 key: train_fscore value: [0.96163683 0.96891192 0.97461929 0.97435897 0.95897436 0.96410256 0.96124031 0.96640827 0.96675192 0.97157623] mean value: 0.9668580656879063 key: test_precision value: [0.84210526 0.76 0.85 0.69565217 0.78947368 0.9 0.85714286 0.80952381 0.81818182 0.83333333] mean value: 0.8155412939463282 key: train_precision value: [0.95431472 0.97395833 0.96 0.96938776 0.95408163 0.95431472 0.95876289 0.96391753 0.95454545 0.96907216] mean value: 0.9612355194577843 key: test_recall value: [0.76190476 0.9047619 0.80952381 0.76190476 0.71428571 0.81818182 0.81818182 0.77272727 0.81818182 0.68181818] mean value: 0.7861471861471861 key: train_recall value: [0.96907216 0.96391753 0.98969072 0.97938144 0.96391753 0.97409326 0.96373057 0.96891192 0.97927461 0.97409326] mean value: 0.9726083008386304 key: test_roc_auc value: [0.81277056 0.81601732 0.83658009 0.72186147 0.76623377 0.86147186 0.83766234 0.79112554 0.81385281 0.76948052] mean value: 0.8027056277056277 key: train_roc_auc value: [0.96122002 0.9690054 0.97411997 0.97414668 0.9586427 0.96385076 0.96124673 0.96641472 0.96644143 0.97158271] mean value: 0.9666671117995833 key: test_jcc value: [0.66666667 0.7037037 0.70833333 0.57142857 0.6 0.75 0.72 0.65384615 0.69230769 0.6 ] mean value: 0.6666286121286121 key: train_jcc value: [0.92610837 0.93969849 0.95049505 0.95 0.92118227 0.93069307 0.92537313 0.935 0.93564356 0.94472362] mean value: 0.9358917568443528 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.14063168 0.10446286 0.10352421 0.12305665 0.09840965 0.05504537 0.05416179 0.06609821 0.05265355 0.06720805] mean value: 0.08652520179748535 key: score_time value: [0.02307844 0.02167487 0.02694726 0.02493787 0.02693772 0.02003384 0.01963663 0.02375698 0.01780176 0.02739167] mean value: 0.023219704627990723 key: test_mcc value: [0.63732414 0.53463203 0.723327 0.62964308 0.62770563 0.82901914 0.58824786 0.58824786 0.65585036 0.61748053] mean value: 0.6431477616870775 key: train_mcc value: [0.96904298 0.96904463 0.98450937 0.94316543 0.94832007 0.96393847 0.96899204 0.98450896 0.9741727 0.97427611] mean value: 0.9679970760915598 key: test_accuracy value: [0.81395349 0.76744186 0.86046512 0.81395349 0.81395349 0.90697674 0.79069767 0.79069767 0.81395349 0.79069767] mean value: 0.8162790697674418 key: train_accuracy value: [0.98449612 0.98449612 0.99224806 0.97157623 0.97416021 0.98191214 0.98449612 0.99224806 0.9870801 0.9870801 ] mean value: 0.9839793281653747 key: test_fscore value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] [0.78947368 0.76190476 0.85 0.8 0.80952381 0.9 0.7804878 0.7804878 0.78947368 0.75675676] mean value: 0.8018108306362478 key: train_fscore value: [0.98461538 0.98445596 0.99224806 0.97157623 0.9742268 0.98172324 0.98445596 0.99220779 0.98701299 0.98694517] mean value: 0.983946758177471 key: test_precision value: [0.88235294 0.76190476 0.89473684 0.84210526 0.80952381 1. 0.84210526 0.84210526 0.9375 0.93333333] mean value: 0.8745667477517323 key: train_precision value: [0.97959184 0.98958333 0.99481865 0.97409326 0.9742268 0.98947368 0.98445596 0.99479167 0.98958333 0.99473684] mean value: 0.9865355376155196 key: test_recall value: [0.71428571 0.76190476 0.80952381 0.76190476 0.80952381 0.81818182 0.72727273 0.72727273 0.68181818 0.63636364] mean value: 0.7448051948051948 key: train_recall value: [0.98969072 0.97938144 0.98969072 0.96907216 0.9742268 0.97409326 0.98445596 0.98963731 0.98445596 0.97927461] mean value: 0.9813978954115699 key: test_roc_auc value: [0.81168831 0.76731602 0.85930736 0.81277056 0.81385281 0.90909091 0.79220779 0.79220779 0.81709957 0.79437229] mean value: 0.8169913419913419 key: train_roc_auc value: [0.98448267 0.98450937 0.99225469 0.97158271 0.97416003 0.98189199 0.98449602 0.99224133 0.98707334 0.98705999] mean value: 0.9839752149991988 key: test_jcc value: [0.65217391 0.61538462 0.73913043 0.66666667 0.68 0.81818182 0.64 0.64 0.65217391 0.60869565] mean value: 0.6712407013276579 key: train_jcc value: [0.96969697 0.96938776 0.98461538 0.94472362 0.94974874 0.96410256 0.96938776 0.98453608 0.97435897 0.9742268 ] mean value: 0.9684784651384958 MCC on Blind test: 0.51 Accuracy on Blind test: 0.76 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.19252563 0.15471125 0.11138511 0.15535378 0.13108945 0.1718359 0.15187907 0.18106008 0.18106508 0.21381736] mean value: 0.16447227001190184 key: score_time value: [0.02305579 0.02334285 0.01446009 0.02341747 0.01428199 0.02465487 0.02461863 0.02343106 0.02325463 0.04997849] mean value: 0.024449586868286133 key: test_mcc value: [0.44227524 0.34848485 0.62770563 0.54609991 0.34848485 0.3030303 0.58134627 0.35748709 0.2567 0.55959928] mean value: 0.4371213413695219 key: train_mcc value: [0.97427816 0.97427816 0.97938089 0.97427816 0.97417339 0.9741727 0.97427611 0.96414361 0.97937979 0.97937979] mean value: 0.9747740775759611 key: test_accuracy value: [0.72093023 0.6744186 0.81395349 0.76744186 0.6744186 0.65116279 0.79069767 0.6744186 0.62790698 0.76744186] mean value: 0.7162790697674418 key: train_accuracy value: [0.9870801 0.9870801 0.98966408 0.9870801 0.9870801 0.9870801 0.9870801 0.98191214 0.98966408 0.98966408] mean value: 0.9873385012919897 key: test_fscore value: [0.7 0.66666667 0.80952381 0.7826087 0.66666667 0.65116279 0.8 0.65 0.66666667 0.73684211] mean value: 0.7130137401136816 key: train_fscore value: [0.98701299 0.98701299 0.98963731 0.98701299 0.9870801 0.98701299 0.98694517 0.9816273 0.98958333 0.98958333] mean value: 0.987250849007799 key: test_precision value: [0.73684211 0.66666667 0.80952381 0.72 0.66666667 0.66666667 0.7826087 0.72222222 0.61538462 0.875 ] mean value: 0.7261581448045978 key: train_precision value: [0.9947644 0.9947644 0.99479167 0.9947644 0.98963731 0.98958333 0.99473684 0.99468085 0.9947644 0.9947644 ] mean value: 0.9937251988397371 key: test_recall value: [0.66666667 0.66666667 0.80952381 0.85714286 0.66666667 0.63636364 0.81818182 0.59090909 0.72727273 0.63636364] mean value: 0.7075757575757575 key: train_recall value: [0.97938144 0.97938144 0.98453608 0.97938144 0.98453608 0.98445596 0.97927461 0.96891192 0.98445596 0.98445596] mean value: 0.9808770898990439 key: test_roc_auc value: [0.71969697 0.67424242 0.81385281 0.76948052 0.67424242 0.65151515 0.79004329 0.67640693 0.62554113 0.77056277] mean value: 0.7165584415584415 key: train_roc_auc value: [0.98710005 0.98710005 0.98967737 0.98710005 0.98708669 0.98707334 0.98705999 0.98187864 0.98965066 0.98965066] mean value: 0.9873377490518669 key: test_jcc value: [0.53846154 0.5 0.68 0.64285714 0.5 0.48275862 0.66666667 0.48148148 0.5 0.58333333] mean value: 0.5575558783489818 key: train_jcc value: [0.97435897 0.97435897 0.97948718 0.97435897 0.9744898 0.97435897 0.9742268 0.96391753 0.97938144 0.97938144] mean value: 0.974832008933629 MCC on Blind test: 0.27 Accuracy on Blind test: 0.64 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.62265253 0.64571428 0.66495323 0.91781998 1.16082621 1.02426267 0.69185972 0.61647153 0.62877178 0.62057972] mean value: 0.7593911647796631 key: score_time value: [0.00941133 0.00966191 0.01070642 0.01257944 0.0127492 0.03311658 0.00940585 0.01045775 0.01020002 0.00968361] mean value: 0.012797212600708008 key: test_mcc value: [0.53463203 0.53796222 0.86147186 0.67988342 0.7756157 0.81778934 0.7756157 0.723327 0.73471273 0.69486034] mean value: 0.7135870332283394 key: train_mcc value: [0.99484522 1. 1. 1. 1. 0.99484536 1. 1. 1. 1. ] mean value: 0.9989690584330672 key: test_accuracy value: [0.76744186 0.76744186 0.93023256 0.8372093 0.88372093 0.90697674 0.88372093 0.86046512 0.86046512 0.8372093 ] mean value: 0.8534883720930233 key: train_accuracy value: [0.99741602 1. 1. 1. 1. 0.99741602 1. 1. 1. 1. ] mean value: 0.999483204134367 key: test_fscore value: [0.76190476 0.77272727 0.93023256 0.82051282 0.88888889 0.9047619 0.87804878 0.86956522 0.85 0.82051282] mean value: 0.8497155025327113 key: train_fscore value: [0.99742931 1. 1. 1. 1. 0.99741602 1. 1. 1. 1. ] mean value: 0.9994845326584431 key: test_precision value: [0.76190476 0.73913043 0.90909091 0.88888889 0.83333333 0.95 0.94736842 0.83333333 0.94444444 0.94117647] mean value: 0.8748670997419147 key: train_precision value: [0.99487179 1. 1. 1. 1. 0.99484536 1. 1. 1. 1. ] mean value: 0.9989717155696537 key: test_recall value: [0.76190476 0.80952381 0.95238095 0.76190476 0.95238095 0.86363636 0.81818182 0.90909091 0.77272727 0.72727273] mean value: 0.8329004329004329 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.76731602 0.76839827 0.93073593 0.83549784 0.88528139 0.90800866 0.88528139 0.85930736 0.86255411 0.83982684] mean value: 0.8542207792207792 key: train_roc_auc value: [0.99740933 1. 1. 1. 1. 0.99742268 1. 1. 1. 1. ] mean value: 0.9994832006837242 key: test_jcc value: [0.61538462 0.62962963 0.86956522 0.69565217 0.8 0.82608696 0.7826087 0.76923077 0.73913043 0.69565217] mean value: 0.7422940666418927 key: train_jcc value: [0.99487179 1. 1. 1. 1. 0.99484536 1. 1. 1. 1. ] mean value: 0.9989717155696537 MCC on Blind test: 0.52 Accuracy on Blind test: 0.76 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03295922 0.05415583 0.05406284 0.05276442 0.06451917 0.03202033 0.03230309 0.0328548 0.03241825 0.03225565] mean value: 0.042031359672546384 key: score_time value: [0.01492977 0.02648234 0.02877092 0.03977489 0.01995325 0.01646137 0.02642083 0.01375341 0.01634121 0.01620936] mean value: 0.0219097375869751 key: test_mcc value: [0.28169285 0.44701207 0.2270149 0.53463203 0.30265778 0.38684081 0.26106714 0.53595916 0.17358241 0.20824344] mean value: 0.33587025909950874 key: train_mcc value: [0.72548957 0.76710504 0.85389622 0.93292554 0.92769572 0.69709662 0.66867145 0.72588852 0.92048062 0.8179781 ] mean value: 0.8037227392706445 key: test_accuracy value: [0.62790698 0.72093023 0.60465116 0.76744186 0.65116279 0.6744186 0.62790698 0.76744186 0.58139535 0.60465116] mean value: 0.6627906976744186 key: train_accuracy value: [0.84496124 0.87855297 0.92248062 0.96640827 0.96382429 0.82687339 0.80878553 0.84496124 0.95865633 0.90180879] mean value: 0.8917312661498707 key: test_fscore value: [0.68 0.68421053 0.65306122 0.76190476 0.61538462 0.74074074 0.68 0.7826087 0.66666667 0.62222222] mean value: 0.6886799453376766 key: train_fscore value: [0.86607143 0.86834734 0.92788462 0.96675192 0.96410256 0.85209713 0.83913043 0.86547085 0.960199 0.90995261] mean value: 0.9020007893806317 key: test_precision value: [0.5862069 0.76470588 0.57142857 0.76190476 0.66666667 0.625 0.60714286 0.75 0.5625 0.60869565] mean value: 0.6504251288221435 key: train_precision value: [0.76377953 0.95092025 0.86936937 0.95939086 0.95918367 0.74230769 0.72284644 0.76284585 0.92344498 0.83842795] mean value: 0.8492516586473186 key: test_recall value: [0.80952381 0.61904762 0.76190476 0.76190476 0.57142857 0.90909091 0.77272727 0.81818182 0.81818182 0.63636364] mean value: 0.7478354978354979 key: train_recall value: [1. 0.79896907 0.99484536 0.9742268 0.96907216 1. 1. 1. 1. 0.99481865] mean value: 0.9731932054911596 key: test_roc_auc value: [0.63203463 0.71861472 0.60822511 0.76731602 0.64935065 0.66883117 0.62445887 0.76623377 0.57575758 0.6038961 ] mean value: 0.6614718614718614 key: train_roc_auc value: [0.84455959 0.87875915 0.92229315 0.96638801 0.96381069 0.82731959 0.80927835 0.84536082 0.95876289 0.9020485 ] mean value: 0.8918580738208429 key: test_jcc value: [0.51515152 0.52 0.48484848 0.61538462 0.44444444 0.58823529 0.51515152 0.64285714 0.5 0.4516129 ] mean value: 0.5277685915181172 key: train_jcc value: [0.76377953 0.76732673 0.86547085 0.93564356 0.93069307 0.74230769 0.72284644 0.76284585 0.92344498 0.83478261] mean value: 0.8249141314743462 MCC on Blind test: 0.07 Accuracy on Blind test: 0.55 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02470899 0.03221989 0.03314185 0.0360744 0.03333807 0.03217483 0.03215766 0.03216076 0.03210974 0.032897 ] mean value: 0.03209831714630127 key: score_time value: [0.02296257 0.02556992 0.02539229 0.02259994 0.02406454 0.02528095 0.02319956 0.02534366 0.02206135 0.02454829] mean value: 0.024102306365966795 key: test_mcc value: [0.58225108 0.4633482 0.4912706 0.3961039 0.4633482 0.64040632 0.63123793 0.67532468 0.35185603 0.32463131] mean value: 0.5019778248370252 key: train_mcc value: [0.76230669 0.76231938 0.76227231 0.74163306 0.73220717 0.74686824 0.73143499 0.77778965 0.77786089 0.74226246] mean value: 0.7536954854633586 key: test_accuracy value: [0.79069767 0.72093023 0.74418605 0.69767442 0.72093023 0.81395349 0.81395349 0.8372093 0.6744186 0.65116279] mean value: 0.7465116279069768 key: train_accuracy value: [0.88113695 0.88113695 0.88113695 0.87080103 0.86563307 0.87338501 0.86563307 0.88888889 0.88888889 0.87080103] mean value: 0.8767441860465116 key: test_fscore value: [0.79069767 0.75 0.71794872 0.69767442 0.75 0.8 0.80952381 0.8372093 0.70833333 0.59459459] mean value: 0.7455981850749293 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_sl.py:196: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_cd_sl.py:199: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.88205128 0.88082902 0.8814433 0.87179487 0.86934673 0.87403599 0.86666667 0.88888889 0.88772846 0.87309645] mean value: 0.8775881653530921 key: test_precision value: [0.77272727 0.66666667 0.77777778 0.68181818 0.66666667 0.88888889 0.85 0.85714286 0.65384615 0.73333333] mean value: 0.7548867798867799 key: train_precision value: [0.87755102 0.88541667 0.8814433 0.86734694 0.84803922 0.86734694 0.85786802 0.88659794 0.89473684 0.85572139] mean value: 0.8722068272870185 key: test_recall value: [0.80952381 0.85714286 0.66666667 0.71428571 0.85714286 0.72727273 0.77272727 0.81818182 0.77272727 0.5 ] mean value: 0.7495670995670995 key: train_recall value: [0.88659794 0.87628866 0.8814433 0.87628866 0.89175258 0.88082902 0.87564767 0.89119171 0.88082902 0.89119171] mean value: 0.8832060253191603 key: test_roc_auc value: [0.79112554 0.72402597 0.74242424 0.69805195 0.72402597 0.81601732 0.81493506 0.83766234 0.67207792 0.6547619 ] mean value: 0.7475108225108226 key: train_roc_auc value: [0.8811228 0.88114951 0.88113616 0.87078682 0.86556541 0.8734042 0.86565889 0.88889482 0.88886812 0.87085359] mean value: 0.8767440307675872 key: test_jcc value: [0.65384615 0.6 0.56 0.53571429 0.6 0.66666667 0.68 0.72 0.5483871 0.42307692] mean value: 0.5987691126078223 key: train_jcc value: [0.78899083 0.78703704 0.78801843 0.77272727 0.76888889 0.77625571 0.76470588 0.8 0.79812207 0.77477477] mean value: 0.7819520888138968 MCC on Blind test: 0.4 Accuracy on Blind test: 0.7 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.32377195 0.44215226 0.36714745 0.3412354 0.3573339 0.34631348 0.37735128 0.42175651 0.370193 0.30671382] mean value: 0.3653969049453735 key: score_time value: [0.02374911 0.02069783 0.0234592 0.02423143 0.02390671 0.02344298 0.0230155 0.0243516 0.02337193 0.02246904] mean value: 0.02326953411102295 key: test_mcc value: [0.62770563 0.49456394 0.67462198 0.44468651 0.62770563 0.67532468 0.77418983 0.72451364 0.30265778 0.3030303 ] mean value: 0.5648999907145589 key: train_mcc value: [0.7002564 0.70030181 0.65903507 0.69005132 0.66414682 0.67974678 0.65398592 0.69510176 0.69557211 0.7059139 ] mean value: 0.6844111884571173 key: test_accuracy value: [0.81395349 0.74418605 0.8372093 0.72093023 0.81395349 0.8372093 0.88372093 0.86046512 0.65116279 0.65116279] mean value: 0.7813953488372093 key: train_accuracy value: [0.8501292 0.8501292 0.82945736 0.84496124 0.83204134 0.83979328 0.82687339 0.84754522 0.84754522 0.85271318] mean value: 0.8421188630490956 key: test_fscore value: [0.80952381 0.75555556 0.82926829 0.72727273 0.80952381 0.8372093 0.89361702 0.85714286 0.68085106 0.65116279] mean value: 0.7851127229831325 key: train_fscore value: [0.85051546 0.84974093 0.83163265 0.84693878 0.83375959 0.84102564 0.8286445 0.84754522 0.84987277 0.85496183] mean value: 0.8434637383464901 key: test_precision value: [0.80952381 0.70833333 0.85 0.69565217 0.80952381 0.85714286 0.84 0.9 0.64 0.66666667] mean value: 0.7776842650103519 key: train_precision value: [0.85051546 0.85416667 0.82323232 0.83838384 0.82741117 0.83248731 0.81818182 0.84536082 0.835 0.84 ] mean value: 0.8364739412281801 key: test_recall value: [0.80952381 0.80952381 0.80952381 0.76190476 0.80952381 0.81818182 0.95454545 0.81818182 0.72727273 0.63636364] mean value: 0.7954545454545454 key: train_recall value: [0.85051546 0.84536082 0.84020619 0.8556701 0.84020619 0.84974093 0.83937824 0.84974093 0.86528497 0.87046632] mean value: 0.8506570161850329 key: test_roc_auc value: [0.81385281 0.745671 0.83658009 0.72186147 0.81385281 0.83766234 0.88203463 0.86147186 0.64935065 0.65151515] mean value: 0.7813852813852814 key: train_roc_auc value: [0.8501282 0.85014155 0.82942952 0.8449335 0.83202019 0.83981892 0.82690561 0.84755088 0.84759094 0.85275893] mean value: 0.8421278243683564 key: test_jcc value: [0.68 0.60714286 0.70833333 0.57142857 0.68 0.72 0.80769231 0.75 0.51612903 0.48275862] mean value: 0.6523484722544789 key: train_jcc value: [0.73991031 0.73873874 0.71179039 0.73451327 0.71491228 0.72566372 0.70742358 0.73542601 0.73893805 0.74666667] mean value: 0.7293983027024029 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7