/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_cd_7030.py:548: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 817 PASS: my_features_df and aa_df successfully combined nrows: 817 ncols: 269 count of NULL values before imputation or_mychisq 244 log10_or_mychisq 244 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML Total no. of features for aaindex: 123 No. of numerical features: 168 No. of categorical features: 7 PASS: x_features has no target variable No. of columns for x_features: 175 ------------------------------------------------------------- Successfully split data with stratification [COMPLETE data]: 70/30 Original data size: (817, 175) Train data size: (547, 175) Test data size: (270, 175) y_train numbers: Counter({0: 317, 1: 230}) y_train ratio: 1.3782608695652174 y_test_numbers: Counter({0: 156, 1: 114}) y_test ratio: 1.368421052631579 ------------------------------------------------------------- index: 0 ind: 1 Mask count check: True index: 1 ind: 2 Mask count check: True Original Data Counter({0: 317, 1: 230}) Data dim: (547, 175) Simple Random OverSampling Counter({1: 317, 0: 317}) (634, 175) Simple Random UnderSampling Counter({0: 230, 1: 230}) (460, 175) Simple Combined Over and UnderSampling Counter({0: 317, 1: 317}) (634, 175) SMOTE_NC OverSampling Counter({1: 317, 0: 317}) (634, 175) ##################################################################### Running ML analysis [COMPLETE DATA]: 70/30 split Gene name: katG Drug name: isoniazid Output directory: /home/tanu/git/Data/isoniazid/output/ml/tts_cd_7030/ Sanity checks: Total input features: 175 Training data size: (547, 175) Test data size: (270, 175) Target feature numbers (training data): Counter({0: 317, 1: 230}) Target features ratio (training data: 1.3782608695652174 Target feature numbers (test data): Counter({0: 156, 1: 114}) Target features ratio (test data): 1.368421052631579 ##################################################################### ================================================================ Strucutral features (n): 36 These are: Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist'] FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss'] Other struc columns: ['rsa', 'kd_values', 'rd_values'] ================================================================ AAindex features (n): 123 These are: ['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'] ================================================================ Evolutionary features (n): 3 These are: ['consurf_score', 'snap2_score', 'provean_score'] ================================================================ Genomic features (n): 6 These are: ['maf', 'logorI'] ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'] ================================================================ Categorical features (n): 7 These are: ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'] ================================================================ Pass: No. of features match ##################################################################### Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.04179597 0.03566599 0.0486362 0.03750038 0.06210589 0.05623746 0.03859067 0.03731346 0.03781509 0.03734136] mean value: 0.04330024719238281 key: score_time value: [0.01308751 0.01230502 0.01895547 0.01553655 0.02064085 0.01912975 0.01783442 0.01562071 0.01579142 0.01538372] mean value: 0.016428542137145997 key: test_mcc value: [0.58514212 0.51163988 0.42094935 0.56841568 0.66622595 0.48454371 0.48270989 0.56175441 0.70238885 0.61883928] mean value: 0.5602609133851035 key: train_mcc value: [0.68269604 0.70811086 0.70414271 0.69939808 0.68309179 0.72955341 0.69675461 0.75489046 0.69671044 0.70063657] mean value: 0.7055984965434541 key: test_accuracy value: [0.8 0.76363636 0.70909091 0.78181818 0.83636364 0.74545455 0.74545455 0.77777778 0.85185185 0.81481481] mean value: 0.7826262626262627 key: train_accuracy value: [0.84552846 0.85772358 0.85569106 0.85365854 0.84552846 0.86788618 0.85162602 0.88032454 0.85192698 0.85395538] mean value: 0.8563849172974488 key: test_fscore value: [0.74418605 0.71111111 0.68 0.76 0.80851064 0.70833333 0.61111111 0.76 0.83333333 0.77272727] mean value: 0.7389312846425662 key: train_fscore value: [0.81553398 0.83091787 0.82891566 0.82524272 0.81642512 0.8441247 0.82577566 0.85851319 0.82494005 0.82692308] mean value: 0.829731202774635 key: test_precision value: [0.8 0.72727273 0.62962963 0.7037037 0.79166667 0.68 0.84615385 0.7037037 0.8 0.80952381] mean value: 0.7491654086654087 key: train_precision value: [0.8195122 0.83091787 0.82692308 0.82926829 0.81642512 0.83809524 0.81603774 0.85238095 0.81904762 0.82296651] mean value: 0.8271574612446937 key: test_recall value: [0.69565217 0.69565217 0.73913043 0.82608696 0.82608696 0.73913043 0.47826087 0.82608696 0.86956522 0.73913043] mean value: 0.7434782608695651 key: train_recall value: [0.8115942 0.83091787 0.83091787 0.82125604 0.81642512 0.85024155 0.83574879 0.8647343 0.83091787 0.83091787] mean value: 0.8323671497584542 key: test_roc_auc value: [0.78532609 0.75407609 0.71331522 0.78804348 0.83491848 0.74456522 0.70788043 0.78401122 0.85413745 0.80504909] mean value: 0.7771322755960729 key: train_roc_auc value: [0.84088482 0.85405543 0.85230104 0.84922451 0.84154589 0.86547165 0.84945334 0.87817135 0.84902537 0.85077362] mean value: 0.8530907028389867 key: test_jcc value: [0.59259259 0.55172414 0.51515152 0.61290323 0.67857143 0.5483871 0.44 0.61290323 0.71428571 0.62962963] mean value: 0.5896148566549011 key: train_jcc value: [0.68852459 0.7107438 0.70781893 0.70247934 0.68979592 0.73029046 0.70325203 0.75210084 0.70204082 0.70491803] mean value: 0.7091964757469712 MCC on Blind test: 0.46 Accuracy on Blind test: 0.74 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.84767032 0.98310471 0.90616822 0.91747785 1.05338669 1.27765465 1.05463815 0.8606627 1.05289483 0.91209054] mean value: 0.9865748643875122 key: score_time value: [0.01637244 0.01902914 0.01560688 0.01560307 0.01704359 0.01559401 0.01545811 0.01693511 0.01557612 0.01548362] mean value: 0.016270208358764648 key: test_mcc value: [0.58703744 0.70108696 0.50741958 0.67387468 0.70662625 0.61131498 0.51203338 0.62728193 0.66155709 0.65775818] mean value: 0.6245990450015361 key: train_mcc value: [0.86303482 0.83412421 0.85822527 0.82876173 0.82943939 0.8580675 0.85653699 0.85096756 0.82173265 0.87508868] mean value: 0.8475978801436712 key: test_accuracy value: [0.8 0.85454545 0.74545455 0.83636364 0.85454545 0.8 0.76363636 0.81481481 0.83333333 0.83333333] mean value: 0.8136026936026937 key: train_accuracy value: [0.93292683 0.91869919 0.93089431 0.91666667 0.91666667 0.93089431 0.92886179 0.92697769 0.9127789 0.93914807] mean value: 0.9254514421411962 key: test_fscore value: [0.73170732 0.82608696 0.73076923 0.81632653 0.83333333 0.78431373 0.66666667 0.79166667 0.80851064 0.79069767] mean value: 0.7780078739849725 key: train_fscore value: [0.92124105 0.9047619 0.9178744 0.90024331 0.90167866 0.91747573 0.91803279 0.91428571 0.8973747 0.92753623] mean value: 0.9120504479974278 key: test_precision value: [0.83333333 0.82608696 0.65517241 0.76923077 0.8 0.71428571 0.8125 0.76 0.79166667 0.85 ] mean value: 0.7812275853831326 key: train_precision value: [0.91037736 0.89201878 0.9178744 0.90686275 0.8952381 0.92195122 0.89090909 0.90140845 0.88679245 0.92753623] mean value: 0.9050968820144447 key: test_recall value: [0.65217391 0.82608696 0.82608696 0.86956522 0.86956522 0.86956522 0.56521739 0.82608696 0.82608696 0.73913043] mean value: 0.7869565217391304 key: train_recall value: [0.93236715 0.9178744 0.9178744 0.89371981 0.90821256 0.91304348 0.9468599 0.92753623 0.90821256 0.92753623] mean value: 0.9193236714975845 key: test_roc_auc value: [0.77921196 0.85054348 0.75679348 0.84103261 0.85665761 0.80978261 0.7357337 0.81626928 0.83239832 0.82117812] mean value: 0.8099601157082749 key: train_roc_auc value: [0.93285024 0.91858632 0.92911264 0.91352657 0.91550979 0.92845156 0.93132469 0.92705483 0.91214824 0.93754434] mean value: 0.9246109217505099 key: test_jcc value: [0.57692308 0.7037037 0.57575758 0.68965517 0.71428571 0.64516129 0.5 0.65517241 0.67857143 0.65384615] mean value: 0.639307652961713 key: train_jcc value: [0.8539823 0.82608696 0.84821429 0.81858407 0.8209607 0.84753363 0.84848485 0.84210526 0.81385281 0.86486486] mean value: 0.8384669735254815 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01469278 0.01110578 0.01064801 0.01035357 0.01014662 0.01015067 0.01030087 0.01057434 0.01018524 0.01026011] mean value: 0.010841798782348634 key: score_time value: [0.01228666 0.00966001 0.00951886 0.00936699 0.00913262 0.0093379 0.00910378 0.00910497 0.00907636 0.00917196] mean value: 0.009576010704040527 key: test_mcc value: [0.27863911 0.61131498 0.37855111 0.34721618 0.55163043 0.53758181 0.39590764 0.34195219 0.41195324 0.17664036] mean value: 0.4031387068121861 key: train_mcc value: [0.43665035 0.53827187 0.5702923 0.49863477 0.46849367 0.51713016 0.49341477 0.51754167 0.48030445 0.48526784] mean value: 0.5006001832001062 key: test_accuracy value: [0.65454545 0.8 0.69090909 0.67272727 0.78181818 0.76363636 0.70909091 0.62962963 0.7037037 0.59259259] mean value: 0.6998653198653199 key: train_accuracy value: [0.7296748 0.7703252 0.78658537 0.75203252 0.73373984 0.7601626 0.74796748 0.72616633 0.74239351 0.73833671] mean value: 0.7487384356602187 key: test_fscore value: [0.55813953 0.78431373 0.65306122 0.64 0.73913043 0.74509804 0.63636364 0.66666667 0.68 0.54166667] mean value: 0.6644439928558977 key: train_fscore value: [0.6395664 0.74141876 0.75862069 0.71759259 0.70561798 0.7293578 0.71689498 0.73887814 0.70804598 0.71772429] mean value: 0.7173717604061177 key: test_precision value: [0.6 0.71428571 0.61538462 0.59259259 0.73913043 0.67857143 0.66666667 0.54054054 0.62962963 0.52 ] mean value: 0.6296801622453796 key: train_precision value: [0.72839506 0.70434783 0.72368421 0.68888889 0.65966387 0.69432314 0.67965368 0.61612903 0.6754386 0.656 ] mean value: 0.682652430528455 key: test_recall value: [0.52173913 0.86956522 0.69565217 0.69565217 0.73913043 0.82608696 0.60869565 0.86956522 0.73913043 0.56521739] mean value: 0.7130434782608696 key: train_recall value: [0.57004831 0.7826087 0.79710145 0.74879227 0.75845411 0.76811594 0.75845411 0.92270531 0.74396135 0.79227053] mean value: 0.7642512077294686 key: test_roc_auc value: [0.63586957 0.80978261 0.69157609 0.67595109 0.77581522 0.77241848 0.69497283 0.66058906 0.70827489 0.58906031] mean value: 0.7014310133239832 key: train_roc_auc value: [0.70783117 0.7720061 0.78802441 0.75158912 0.73712179 0.76125095 0.74940249 0.7533107 0.74261005 0.74578562] mean value: 0.7508932397376333 key: test_jcc value: [0.38709677 0.64516129 0.48484848 0.47058824 0.5862069 0.59375 0.46666667 0.5 0.51515152 0.37142857] mean value: 0.5020898434457208 key: train_jcc value: [0.47011952 0.58909091 0.61111111 0.55956679 0.54513889 0.57400722 0.55871886 0.58588957 0.5480427 0.55972696] mean value: 0.560141253706926 MCC on Blind test: 0.43 Accuracy on Blind test: 0.72 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0105474 0.01046133 0.01060605 0.01042318 0.0104599 0.01057959 0.01052642 0.01053691 0.01061201 0.01110291] mean value: 0.010585570335388183 key: score_time value: [0.00915074 0.0091815 0.00931263 0.00919509 0.00912094 0.00941777 0.00916123 0.00909567 0.00914884 0.00942039] mean value: 0.009220480918884277 key: test_mcc value: [0.43639872 0.35136547 0.26246118 0.47690217 0.37855111 0.51757513 0.52002216 0.36008804 0.46984572 0.35286527] mean value: 0.41260749687190534 key: train_mcc value: [0.53522558 0.48022157 0.52596414 0.55027047 0.50963899 0.55595916 0.55027047 0.55363278 0.52533698 0.52804105] mean value: 0.5314561192117819 key: test_accuracy value: [0.72727273 0.69090909 0.63636364 0.74545455 0.69090909 0.76363636 0.76363636 0.68518519 0.74074074 0.68518519] mean value: 0.712929292929293 key: train_accuracy value: [0.77439024 0.74390244 0.76829268 0.7804878 0.75813008 0.78455285 0.7804878 0.78296146 0.76876268 0.77079108] mean value: 0.7712759115420769 key: test_fscore value: [0.66666667 0.58536585 0.58333333 0.69565217 0.65306122 0.72340426 0.64864865 0.63829787 0.69565217 0.62222222] mean value: 0.6512304424504864 key: train_fscore value: [0.72727273 0.70560748 0.72727273 0.74038462 0.72261072 0.73891626 0.74038462 0.73965937 0.72463768 0.72371638] mean value: 0.7290462570692664 key: test_precision value: [0.68181818 0.66666667 0.56 0.69565217 0.61538462 0.70833333 0.85714286 0.625 0.69565217 0.63636364] mean value: 0.6742013638535378 key: train_precision value: [0.74 0.68325792 0.72037915 0.73684211 0.6981982 0.75376884 0.73684211 0.74509804 0.72463768 0.73267327] mean value: 0.7271697306118926 key: test_recall value: [0.65217391 0.52173913 0.60869565 0.69565217 0.69565217 0.73913043 0.52173913 0.65217391 0.69565217 0.60869565] mean value: 0.6391304347826087 key: train_recall value: [0.71497585 0.7294686 0.73429952 0.74396135 0.74879227 0.72463768 0.74396135 0.73429952 0.72463768 0.71497585] mean value: 0.7314009661835749 key: test_roc_auc value: [0.71671196 0.66711957 0.63247283 0.73845109 0.69157609 0.76019022 0.72961957 0.68092567 0.73492286 0.67531557] mean value: 0.7027305399719495 key: train_roc_auc value: [0.76625985 0.74192728 0.76364099 0.77548945 0.75685228 0.77635393 0.77548945 0.77624067 0.76266849 0.76308233] mean value: 0.7658004708233541 key: test_jcc value: [0.5 0.4137931 0.41176471 0.53333333 0.48484848 0.56666667 0.48 0.46875 0.53333333 0.4516129 ] mean value: 0.48441025307382535 key: train_jcc value: [0.57142857 0.54512635 0.57142857 0.58778626 0.56569343 0.5859375 0.58778626 0.58687259 0.56818182 0.56704981] mean value: 0.5737291159872184 MCC on Blind test: 0.29 Accuracy on Blind test: 0.66 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.0096848 0.01012468 0.0108521 0.01071572 0.01056528 0.00939178 0.01068068 0.01059747 0.01128244 0.01058197] mean value: 0.01044769287109375 key: score_time value: [0.07402992 0.01488328 0.01613545 0.01658916 0.0157609 0.01435757 0.01600146 0.01821113 0.01360273 0.0135684 ] mean value: 0.021314001083374022 key: test_mcc value: [0.15082668 0.26246118 0.30472022 0.13543408 0.39181209 0.50741958 0.26855929 0.31155357 0.46984572 0.07116075] mean value: 0.2873793170233383 key: train_mcc value: [0.56875384 0.53684571 0.56134034 0.57605667 0.55427498 0.55955672 0.5401765 0.63143761 0.527033 0.56456169] mean value: 0.5620037067821323 key: test_accuracy value: [0.6 0.63636364 0.65454545 0.58181818 0.70909091 0.74545455 0.65454545 0.66666667 0.74074074 0.55555556] mean value: 0.6544781144781144 key: train_accuracy value: [0.79065041 0.77439024 0.78658537 0.79471545 0.78455285 0.78658537 0.77642276 0.82150101 0.77079108 0.78904665] mean value: 0.7875241181417899 key: test_fscore value: [0.45 0.58333333 0.6122449 0.48888889 0.61904762 0.73076923 0.51282051 0.59090909 0.69565217 0.42857143] mean value: 0.5712237176212331 key: train_fscore value: [0.74692875 0.73123487 0.74452555 0.74812968 0.73232323 0.73945409 0.73170732 0.78109453 0.72098765 0.74257426] mean value: 0.7418959919811685 key: test_precision value: [0.52941176 0.56 0.57692308 0.5 0.68421053 0.65517241 0.625 0.61904762 0.69565217 0.47368421] mean value: 0.5919101785224831 key: train_precision value: [0.76 0.73300971 0.75 0.77319588 0.76719577 0.76020408 0.73891626 0.80512821 0.73737374 0.76142132] mean value: 0.7586444952311476 key: test_recall value: [0.39130435 0.60869565 0.65217391 0.47826087 0.56521739 0.82608696 0.43478261 0.56521739 0.69565217 0.39130435] mean value: 0.5608695652173913 key: train_recall value: [0.73429952 0.7294686 0.73913043 0.72463768 0.70048309 0.71980676 0.72463768 0.75845411 0.70531401 0.72463768] mean value: 0.7260869565217392 key: test_roc_auc value: [0.57065217 0.63247283 0.65421196 0.56725543 0.6888587 0.75679348 0.6236413 0.65357644 0.73492286 0.53436185] mean value: 0.6416747019635344 key: train_roc_auc value: [0.78293923 0.76824307 0.78009153 0.78512586 0.77304856 0.77744724 0.76933638 0.81279349 0.76174791 0.78015101] mean value: 0.7790924293098207 key: test_jcc value: [0.29032258 0.41176471 0.44117647 0.32352941 0.44827586 0.57575758 0.34482759 0.41935484 0.53333333 0.27272727] mean value: 0.4061069637684177 key: train_jcc value: [0.59607843 0.57633588 0.59302326 0.59760956 0.57768924 0.58661417 0.57692308 0.64081633 0.56370656 0.59055118] mean value: 0.5899347691320936 MCC on Blind test: 0.24 Accuracy on Blind test: 0.64 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02656698 0.02517915 0.02399397 0.02391505 0.02421665 0.02414322 0.02453303 0.02408743 0.02439761 0.0245142 ] mean value: 0.024554729461669922 key: score_time value: [0.01422763 0.01268554 0.0124557 0.01299238 0.01262712 0.01349807 0.0126636 0.01236033 0.01320577 0.012743 ] mean value: 0.012945914268493652 key: test_mcc value: [0.4299228 0.50848012 0.5262129 0.43189061 0.54764925 0.49468252 0.46046933 0.30312793 0.54201786 0.58258986] mean value: 0.4827043155774927 key: train_mcc value: [0.62618197 0.61352703 0.63465505 0.56215877 0.57498214 0.6140883 0.57937741 0.63511179 0.60989754 0.64470036] mean value: 0.6094680362515074 key: test_accuracy value: [0.72727273 0.76363636 0.76363636 0.72727273 0.78181818 0.74545455 0.72727273 0.66666667 0.77777778 0.7962963 ] mean value: 0.7477104377104378 key: train_accuracy value: [0.81910569 0.81300813 0.82317073 0.78861789 0.79471545 0.81300813 0.79674797 0.82352941 0.81135903 0.82758621] mean value: 0.8110848628770263 key: test_fscore value: [0.63414634 0.68292683 0.73469388 0.65116279 0.72727273 0.72 0.54545455 0.55 0.72727273 0.73170732] mean value: 0.6704637156053572 key: train_fscore value: [0.77468354 0.76767677 0.77974684 0.72916667 0.73901809 0.77114428 0.74489796 0.77862595 0.76574307 0.79115479] mean value: 0.7641857958137329 key: test_precision value: [0.72222222 0.77777778 0.69230769 0.7 0.76190476 0.66666667 0.9 0.64705882 0.76190476 0.83333333] mean value: 0.7463176039646627 key: train_precision value: [0.81382979 0.8042328 0.81914894 0.79096045 0.79444444 0.79487179 0.78918919 0.82258065 0.8 0.805 ] mean value: 0.8034258053281179 key: test_recall value: [0.56521739 0.60869565 0.7826087 0.60869565 0.69565217 0.7826087 0.39130435 0.47826087 0.69565217 0.65217391] mean value: 0.6260869565217392 key: train_recall value: [0.73913043 0.73429952 0.74396135 0.6763285 0.69082126 0.74879227 0.70531401 0.73913043 0.73429952 0.77777778] mean value: 0.7289855072463768 key: test_roc_auc value: [0.7044837 0.74184783 0.76630435 0.71059783 0.76970109 0.75067935 0.68002717 0.64235624 0.76718093 0.77769986] mean value: 0.7310878330995793 key: train_roc_auc value: [0.80816171 0.80223748 0.81233155 0.77325197 0.78049835 0.8042207 0.78423595 0.81187291 0.80071619 0.82070707] mean value: 0.7998233879011911 key: test_jcc value: [0.46428571 0.51851852 0.58064516 0.48275862 0.57142857 0.5625 0.375 0.37931034 0.57142857 0.57692308] mean value: 0.5082798579392016 key: train_jcc value: [0.6322314 0.62295082 0.63900415 0.57377049 0.58606557 0.62753036 0.59349593 0.6375 0.62040816 0.65447154] mean value: 0.6187428446894745 MCC on Blind test: 0.29 Accuracy on Blind test: 0.66 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.87738419 1.86355472 1.93836021 1.95548415 2.00255823 1.92464495 1.87634706 1.90383768 1.97237706 2.0040803 ] mean value: 1.9318628549575805 key: score_time value: [0.01933503 0.01476908 0.01467705 0.0127182 0.01512384 0.01787472 0.01688218 0.01641417 0.02369785 0.01700139] mean value: 0.016849350929260255 key: test_mcc value: [0.2859164 0.51757513 0.40378643 0.50741958 0.59190054 0.60004379 0.39594831 0.45526408 0.66901612 0.50265363] mean value: 0.49295240047741534 key: train_mcc value: [0.96305083 0.93934297 0.95474605 0.95876404 0.96664124 0.97538269 0.97094453 0.98335709 0.96267627 0.98335709] mean value: 0.9658262804528608 key: test_accuracy value: [0.65454545 0.76363636 0.69090909 0.74545455 0.8 0.8 0.70909091 0.72222222 0.83333333 0.75925926] mean value: 0.7478451178451179 key: train_accuracy value: [0.98170732 0.9695122 0.97764228 0.9796748 0.98373984 0.98780488 0.98577236 0.99188641 0.98174442 0.99188641] mean value: 0.9831370899915896 key: test_fscore value: [0.57777778 0.72340426 0.67924528 0.73076923 0.76595745 0.7755102 0.55555556 0.70588235 0.81632653 0.69767442] mean value: 0.7028103055488797 key: train_fscore value: [0.97862233 0.96487119 0.97387173 0.97619048 0.98067633 0.98571429 0.98321343 0.99029126 0.97841727 0.99029126] mean value: 0.9802159566259778 key: test_precision value: [0.59090909 0.70833333 0.6 0.65517241 0.75 0.73076923 0.76923077 0.64285714 0.76923077 0.75 ] mean value: 0.696650275012344 key: train_precision value: [0.96261682 0.93636364 0.95794393 0.96244131 0.98067633 0.97183099 0.97619048 0.99512195 0.97142857 0.99512195] mean value: 0.9709735963057159 key: test_recall value: [0.56521739 0.73913043 0.7826087 0.82608696 0.7826087 0.82608696 0.43478261 0.7826087 0.86956522 0.65217391] mean value: 0.7260869565217392 key: train_recall value: [0.99516908 0.99516908 0.99033816 0.99033816 0.98067633 1. 0.99033816 0.98550725 0.98550725 0.98550725] mean value: 0.9898550724637681 key: test_roc_auc value: [0.6419837 0.76019022 0.70380435 0.75679348 0.79755435 0.80366848 0.6705163 0.73001403 0.83800842 0.7454418 ] mean value: 0.7447975105189341 key: train_roc_auc value: [0.98354945 0.97302314 0.97937961 0.98113399 0.98332062 0.98947368 0.98639715 0.99100537 0.98226411 0.99100537] mean value: 0.9840552506227563 key: test_jcc value: [0.40625 0.56666667 0.51428571 0.57575758 0.62068966 0.63333333 0.38461538 0.54545455 0.68965517 0.53571429] mean value: 0.5472422333413712 key: train_jcc value: [0.95813953 0.9321267 0.94907407 0.95348837 0.96208531 0.97183099 0.96698113 0.98076923 0.95774648 0.98076923] mean value: 0.9613011044342935 MCC on Blind test: 0.41 Accuracy on Blind test: 0.71 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.04909205 0.02950025 0.02932739 0.02902603 0.02834392 0.02826023 0.02964878 0.03143978 0.0290525 0.02814364] mean value: 0.031183457374572753 key: score_time value: [0.00961804 0.00957942 0.00933838 0.01010132 0.00951147 0.00917363 0.00951052 0.00927472 0.00931883 0.00968361] mean value: 0.009510993957519531 key: test_mcc value: [0.62352005 0.82153646 0.70108696 0.75878131 0.78065376 0.70662625 0.66559476 0.58152196 0.552175 0.741478 ] mean value: 0.6932974500375436 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.81818182 0.90909091 0.85454545 0.87272727 0.89090909 0.85454545 0.83636364 0.7962963 0.77777778 0.87037037] mean value: 0.8480808080808081 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.77272727 0.89795918 0.82608696 0.8627451 0.875 0.83333333 0.7804878 0.75555556 0.75 0.82926829] mean value: 0.8183163497411561 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.80952381 0.84615385 0.82608696 0.78571429 0.84 0.8 0.88888889 0.77272727 0.72 0.94444444] mean value: 0.8233539503974286 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.73913043 0.95652174 0.82608696 0.95652174 0.91304348 0.86956522 0.69565217 0.73913043 0.7826087 0.73913043] mean value: 0.8217391304347826 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.80706522 0.91576087 0.85054348 0.88451087 0.89402174 0.85665761 0.81657609 0.78892006 0.77840112 0.85343619] mean value: 0.8445893232819074 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.62962963 0.81481481 0.7037037 0.75862069 0.77777778 0.71428571 0.64 0.60714286 0.6 0.70833333] mean value: 0.6954308520343003 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.61 Accuracy on Blind test: 0.81 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.14270854 0.14585114 0.13640237 0.13542676 0.1369288 0.13830209 0.13576722 0.13378549 0.13537693 0.1409452 ] mean value: 0.13814945220947267 key: score_time value: [0.01938272 0.01929069 0.01900387 0.01840878 0.01852822 0.01847625 0.01848555 0.01832962 0.01863289 0.01892066] mean value: 0.018745923042297365 key: test_mcc value: [0.58703744 0.50851637 0.44324972 0.56841568 0.70108696 0.56841568 0.39125402 0.42510136 0.51911209 0.62566799] mean value: 0.5337857310299543 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8 0.76363636 0.72727273 0.78181818 0.85454545 0.78181818 0.70909091 0.72222222 0.75925926 0.81481481] mean value: 0.7714478114478115 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.73170732 0.69767442 0.68085106 0.76 0.82608696 0.76 0.57894737 0.65116279 0.73469388 0.75 ] mean value: 0.7171123792699096 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.83333333 0.75 0.66666667 0.7037037 0.82608696 0.7037037 0.73333333 0.7 0.69230769 0.88235294] mean value: 0.7491488330746643 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.65217391 0.65217391 0.69565217 0.82608696 0.82608696 0.82608696 0.47826087 0.60869565 0.7826087 0.65217391] mean value: 0.7 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.77921196 0.74796196 0.72282609 0.78804348 0.85054348 0.78804348 0.67663043 0.70757363 0.76227209 0.79382889] mean value: 0.7616935483870968 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.57692308 0.53571429 0.51612903 0.61290323 0.7037037 0.61290323 0.40740741 0.48275862 0.58064516 0.6 ] mean value: 0.5629087739599419 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.44 Accuracy on Blind test: 0.73 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.0106895 0.0107975 0.01070595 0.01074672 0.01078081 0.01070833 0.0135324 0.01077175 0.01183271 0.01100755] mean value: 0.01115732192993164 key: score_time value: [0.00930071 0.00926113 0.00914121 0.00889707 0.0094676 0.00902438 0.00900888 0.00885224 0.00996494 0.00909019] mean value: 0.009200835227966308 key: test_mcc value: [0.39590764 0.21105878 0.30472022 0.25271739 0.27280815 0.51163988 0.09242443 0.31155357 0.27664637 0.08108929] mean value: 0.27105657192462734 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.70909091 0.58181818 0.65454545 0.63636364 0.65454545 0.76363636 0.56363636 0.66666667 0.64814815 0.55555556] mean value: 0.6434006734006734 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.63636364 0.59649123 0.6122449 0.56521739 0.53658537 0.71111111 0.45454545 0.59090909 0.57777778 0.45454545] mean value: 0.5735791408439891 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.66666667 0.5 0.57692308 0.56521739 0.61111111 0.72727273 0.47619048 0.61904762 0.59090909 0.47619048] mean value: 0.5809528635615592 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.60869565 0.73913043 0.65217391 0.56521739 0.47826087 0.69565217 0.43478261 0.56521739 0.56521739 0.43478261] mean value: 0.5739130434782609 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.69497283 0.60394022 0.65421196 0.6263587 0.62975543 0.75407609 0.5455163 0.65357644 0.63744741 0.53997195] mean value: 0.6339827314165498 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.46666667 0.425 0.44117647 0.39393939 0.36666667 0.55172414 0.29411765 0.41935484 0.40625 0.29411765] mean value: 0.40590134686193213 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.22 Accuracy on Blind test: 0.62 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [2.00051999 1.9661603 1.94647527 1.96586847 1.95577812 1.93665648 1.92854357 1.96198702 1.95868993 1.9296658 ] mean value: 1.9550344944000244 key: score_time value: [0.09752798 0.09560466 0.0961225 0.09526682 0.09944749 0.09337711 0.09278417 0.10019994 0.09292531 0.09270859] mean value: 0.09559645652770996 key: test_mcc value: [0.70187922 0.88784567 0.74055136 0.68504815 0.75878131 0.75878131 0.70187922 0.5802059 0.77749578 0.78693802] mean value: 0.7379405937626433 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.85454545 0.94545455 0.87272727 0.83636364 0.87272727 0.87272727 0.85454545 0.7962963 0.88888889 0.88888889] mean value: 0.8683164983164983 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.80952381 0.93333333 0.85106383 0.82352941 0.8627451 0.8627451 0.80952381 0.74418605 0.875 0.85 ] mean value: 0.8421650436522952 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.89473684 0.95454545 0.83333333 0.75 0.78571429 0.78571429 0.89473684 0.8 0.84 1. ] mean value: 0.8538781043517886 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.73913043 0.91304348 0.86956522 0.91304348 0.95652174 0.95652174 0.73913043 0.69565217 0.91304348 0.73913043] mean value: 0.8434782608695652 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.83831522 0.94089674 0.87228261 0.84714674 0.88451087 0.88451087 0.83831522 0.78330996 0.89200561 0.86956522] mean value: 0.865085904628331 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.68 0.875 0.74074074 0.7 0.75862069 0.75862069 0.68 0.59259259 0.77777778 0.73913043] mean value: 0.7302482925204065 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.64 Accuracy on Blind test: 0.83 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: fit_time value: [1.84873724 0.99962497 1.07342482 1.02383947 1.04548478 1.01116562 1.02130485 0.99912429 1.02942729 1.06653047] mean value: 1.1118663787841796 key: score_time value: [0.21156573 0.26648188 0.28589416 0.15180969 0.2804842 0.25872636 0.24671173 0.21345377 0.25705051 0.28484893] mean value: 0.24570269584655763 key: test_mcc value: [0.66559476 0.88920218 0.78065376 0.70662625 0.75878131 0.74770557 0.73839363 0.65775818 0.81229162 0.82092207] mean value: 0.7577929322746051 key: train_mcc value: [0.92932537 0.92916753 0.94173192 0.93358762 0.92552675 0.94190647 0.92507398 0.9418183 0.91734185 0.92563864] mean value: 0.9311118449851191 key: test_accuracy value: [0.83636364 0.94545455 0.89090909 0.85454545 0.87272727 0.87272727 0.87272727 0.83333333 0.90740741 0.90740741] mean value: 0.8793602693602693 key: train_accuracy value: [0.96544715 0.96544715 0.97154472 0.96747967 0.96341463 0.97154472 0.96341463 0.97160243 0.95943205 0.96348884] mean value: 0.9662816009498837 key: test_fscore value: [0.7804878 0.93617021 0.875 0.83333333 0.8627451 0.85714286 0.8372093 0.79069767 0.89361702 0.87804878] mean value: 0.8544452084667999 key: train_fscore value: [0.95923261 0.95903614 0.96634615 0.96172249 0.95714286 0.96650718 0.95673077 0.96634615 0.95238095 0.95714286] mean value: 0.96025881671487 key: test_precision value: [0.88888889 0.91666667 0.84 0.8 0.78571429 0.80769231 0.9 0.85 0.875 1. ] mean value: 0.8663962148962149 key: train_precision value: [0.95238095 0.95673077 0.96172249 0.95260664 0.94366197 0.95734597 0.95215311 0.96172249 0.93896714 0.94366197] mean value: 0.9520953494183401 key: test_recall value: [0.69565217 0.95652174 0.91304348 0.86956522 0.95652174 0.91304348 0.7826087 0.73913043 0.91304348 0.7826087 ] mean value: 0.8521739130434782 key: train_recall value: [0.96618357 0.96135266 0.97101449 0.97101449 0.97101449 0.97584541 0.96135266 0.97101449 0.96618357 0.97101449] mean value: 0.9685990338164251 key: test_roc_auc value: [0.81657609 0.94701087 0.89402174 0.85665761 0.88451087 0.87839674 0.86005435 0.82117812 0.90813464 0.89130435] mean value: 0.8757845371669004 key: train_roc_auc value: [0.96554793 0.96488685 0.97147216 0.96796339 0.96445461 0.97213323 0.96313247 0.97152123 0.96036451 0.96452823] mean value: 0.9666004615775782 key: test_jcc value: [0.64 0.88 0.77777778 0.71428571 0.75862069 0.75 0.72 0.65384615 0.80769231 0.7826087 ] mean value: 0.74848313389093 key: train_jcc value: [0.92165899 0.9212963 0.93488372 0.92626728 0.91780822 0.93518519 0.91705069 0.93488372 0.90909091 0.91780822] mean value: 0.9235933229314366 MCC on Blind test: 0.69 Accuracy on Blind test: 0.85 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01144552 0.01041913 0.01031971 0.01029325 0.01053071 0.01064229 0.01054692 0.01049185 0.01066327 0.01136136] mean value: 0.010671401023864746 key: score_time value: [0.00913405 0.00913501 0.00921345 0.0090847 0.00909209 0.00911283 0.00987315 0.00917554 0.00909829 0.00913 ] mean value: 0.009204912185668945 key: test_mcc value: [0.43639872 0.35136547 0.26246118 0.47690217 0.37855111 0.51757513 0.52002216 0.36008804 0.46984572 0.35286527] mean value: 0.41260749687190534 key: train_mcc value: [0.53522558 0.48022157 0.52596414 0.55027047 0.50963899 0.55595916 0.55027047 0.55363278 0.52533698 0.52804105] mean value: 0.5314561192117819 key: test_accuracy value: [0.72727273 0.69090909 0.63636364 0.74545455 0.69090909 0.76363636 0.76363636 0.68518519 0.74074074 0.68518519] mean value: 0.712929292929293 key: train_accuracy value: [0.77439024 0.74390244 0.76829268 0.7804878 0.75813008 0.78455285 0.7804878 0.78296146 0.76876268 0.77079108] mean value: 0.7712759115420769 key: test_fscore value: [0.66666667 0.58536585 0.58333333 0.69565217 0.65306122 0.72340426 0.64864865 0.63829787 0.69565217 0.62222222] mean value: 0.6512304424504864 key: train_fscore value: [0.72727273 0.70560748 0.72727273 0.74038462 0.72261072 0.73891626 0.74038462 0.73965937 0.72463768 0.72371638] mean value: 0.7290462570692664 key: test_precision value: [0.68181818 0.66666667 0.56 0.69565217 0.61538462 0.70833333 0.85714286 0.625 0.69565217 0.63636364] mean value: 0.6742013638535378 key: train_precision value: [0.74 0.68325792 0.72037915 0.73684211 0.6981982 0.75376884 0.73684211 0.74509804 0.72463768 0.73267327] mean value: 0.7271697306118926 key: test_recall value: [0.65217391 0.52173913 0.60869565 0.69565217 0.69565217 0.73913043 0.52173913 0.65217391 0.69565217 0.60869565] mean value: 0.6391304347826087 key: train_recall value: [0.71497585 0.7294686 0.73429952 0.74396135 0.74879227 0.72463768 0.74396135 0.73429952 0.72463768 0.71497585] mean value: 0.7314009661835749 key: test_roc_auc value: [0.71671196 0.66711957 0.63247283 0.73845109 0.69157609 0.76019022 0.72961957 0.68092567 0.73492286 0.67531557] mean value: 0.7027305399719495 key: train_roc_auc value: [0.76625985 0.74192728 0.76364099 0.77548945 0.75685228 0.77635393 0.77548945 0.77624067 0.76266849 0.76308233] mean value: 0.7658004708233541 key: test_jcc value: [0.5 0.4137931 0.41176471 0.53333333 0.48484848 0.56666667 0.48 0.46875 0.53333333 0.4516129 ] mean value: 0.48441025307382535 key: train_jcc value: [0.57142857 0.54512635 0.57142857 0.58778626 0.56569343 0.5859375 0.58778626 0.58687259 0.56818182 0.56704981] mean value: 0.5737291159872184 MCC on Blind test: 0.29 Accuracy on Blind test: 0.66 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.12921071 0.09907699 0.09061193 0.08828115 0.08906078 0.08684015 0.08911514 0.09572601 0.08669972 0.08724833] mean value: 0.09418709278106689 key: score_time value: [0.01232672 0.0113709 0.01125932 0.01129842 0.01127052 0.01127601 0.01118422 0.01114106 0.01104665 0.01139307] mean value: 0.011356687545776368 key: test_mcc value: [0.70187922 0.89536735 0.82153646 0.82153646 0.78961518 0.70662625 0.85054348 0.73395976 0.81229162 0.85538121] mean value: 0.7988736984470703 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.85454545 0.94545455 0.90909091 0.90909091 0.89090909 0.85454545 0.92727273 0.87037037 0.90740741 0.92592593] mean value: 0.8994612794612794 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.80952381 0.93877551 0.89795918 0.89795918 0.88 0.83333333 0.91304348 0.84444444 0.89361702 0.9047619 ] mean value: 0.8813417869151978 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.89473684 0.88461538 0.84615385 0.84615385 0.81481481 0.8 0.91304348 0.86363636 0.875 1. ] mean value: 0.8738154575740388 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.73913043 1. 0.95652174 0.95652174 0.95652174 0.86956522 0.91304348 0.82608696 0.91304348 0.82608696] mean value: 0.8956521739130434 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.83831522 0.953125 0.91576087 0.91576087 0.90013587 0.85665761 0.92527174 0.86465638 0.90813464 0.91304348] mean value: 0.899086167601683 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.68 0.88461538 0.81481481 0.81481481 0.78571429 0.71428571 0.84 0.73076923 0.80769231 0.82608696] mean value: 0.7898793509228291 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.67 Accuracy on Blind test: 0.84 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.05119705 0.08430624 0.08820415 0.07830334 0.07586288 0.06692553 0.0715065 0.08402658 0.04372358 0.05681276] mean value: 0.07008686065673828 key: score_time value: [0.01247501 0.02614236 0.02369547 0.01927376 0.02228785 0.0220902 0.0245657 0.02134919 0.01254559 0.01263094] mean value: 0.019705605506896973 key: test_mcc value: [0.66176788 0.49468252 0.48454371 0.55857122 0.56841568 0.62586896 0.4299228 0.5744289 0.62131837 0.65774086] mean value: 0.5677260905292308 key: train_mcc value: [0.8096421 0.81315076 0.82082225 0.80846845 0.80013948 0.80241214 0.83501834 0.80555438 0.79682809 0.80948363] mean value: 0.8101519608539156 key: test_accuracy value: [0.83636364 0.74545455 0.74545455 0.78181818 0.78181818 0.8 0.72727273 0.77777778 0.81481481 0.83333333] mean value: 0.7844107744107744 key: train_accuracy value: [0.90650407 0.90853659 0.91260163 0.90650407 0.90243902 0.90243902 0.91869919 0.90466531 0.90060852 0.90669371] mean value: 0.9069691122874718 key: test_fscore value: [0.79069767 0.72 0.70833333 0.75 0.76 0.79245283 0.63414634 0.76923077 0.7826087 0.8 ] mean value: 0.7507469644286975 key: train_fscore value: [0.89099526 0.89260143 0.89638554 0.88942308 0.88461538 0.88732394 0.90566038 0.88836105 0.88305489 0.89047619] mean value: 0.8908897145580277 key: test_precision value: [0.85 0.66666667 0.68 0.72 0.7037037 0.7 0.72222222 0.68965517 0.7826087 0.81818182] mean value: 0.7333038278840378 key: train_precision value: [0.8744186 0.88207547 0.89423077 0.88516746 0.88038278 0.8630137 0.88479263 0.87383178 0.87264151 0.87793427] mean value: 0.8788488967608109 key: test_recall value: [0.73913043 0.7826087 0.73913043 0.7826087 0.82608696 0.91304348 0.56521739 0.86956522 0.7826087 0.7826087 ] mean value: 0.7782608695652173 key: train_recall value: [0.90821256 0.90338164 0.89855072 0.89371981 0.88888889 0.91304348 0.92753623 0.90338164 0.89371981 0.90338164] mean value: 0.9033816425120773 key: test_roc_auc value: [0.82269022 0.75067935 0.74456522 0.78192935 0.78804348 0.81589674 0.7044837 0.78962132 0.81065919 0.82678822] mean value: 0.7835356767180925 key: train_roc_auc value: [0.90673786 0.90783117 0.91067887 0.90475464 0.9005848 0.90389016 0.91990847 0.90448802 0.89965711 0.90623628] mean value: 0.9064767370945861 key: test_jcc value: [0.65384615 0.5625 0.5483871 0.6 0.61290323 0.65625 0.46428571 0.625 0.64285714 0.66666667] mean value: 0.6032696000236323 key: train_jcc value: [0.8034188 0.80603448 0.81222707 0.8008658 0.79310345 0.79746835 0.82758621 0.7991453 0.79059829 0.80257511] mean value: 0.8033022867921553 MCC on Blind test: 0.38 Accuracy on Blind test: 0.7 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02037907 0.01109266 0.00997972 0.00990462 0.00994921 0.00996852 0.00993156 0.0100584 0.01054645 0.01049662] mean value: 0.011230683326721192 key: score_time value: [0.00996089 0.00915313 0.00902486 0.00884271 0.00889897 0.00887513 0.00885701 0.0089767 0.00954866 0.0093143 ] mean value: 0.009145236015319825 key: test_mcc value: [0.4105162 0.54684566 0.44324972 0.48454371 0.63259873 0.5262129 0.43189061 0.47706807 0.54201786 0.36008804] mean value: 0.4855031497137916 key: train_mcc value: [0.53047441 0.53357737 0.53918167 0.54965675 0.51912852 0.5401765 0.53276187 0.55541609 0.55003098 0.53609212] mean value: 0.538649627835123 key: test_accuracy value: [0.70909091 0.78181818 0.72727273 0.74545455 0.81818182 0.76363636 0.72727273 0.74074074 0.77777778 0.68518519] mean value: 0.7476430976430977 key: train_accuracy value: [0.7703252 0.77439024 0.77642276 0.7804878 0.76422764 0.77642276 0.7703252 0.78296146 0.77890467 0.77079108] mean value: 0.7745258826827619 key: test_fscore value: [0.66666667 0.71428571 0.68085106 0.70833333 0.79166667 0.73469388 0.65116279 0.70833333 0.72727273 0.63829787] mean value: 0.7021564045977349 key: train_fscore value: [0.73031026 0.72180451 0.72906404 0.73913043 0.72511848 0.73170732 0.73411765 0.74340528 0.74352941 0.73781903] mean value: 0.7336006408609944 key: test_precision value: [0.64 0.78947368 0.66666667 0.68 0.76 0.69230769 0.7 0.68 0.76190476 0.625 ] mean value: 0.6995352805089647 key: train_precision value: [0.72169811 0.75 0.74371859 0.73913043 0.71162791 0.73891626 0.71559633 0.73809524 0.72477064 0.70982143] mean value: 0.7293374943233091 key: test_recall value: [0.69565217 0.65217391 0.69565217 0.73913043 0.82608696 0.7826087 0.60869565 0.73913043 0.69565217 0.65217391] mean value: 0.7086956521739131 key: train_recall value: [0.73913043 0.69565217 0.71497585 0.73913043 0.73913043 0.72463768 0.75362319 0.74879227 0.76328502 0.76811594] mean value: 0.7386473429951691 key: test_roc_auc value: [0.70720109 0.76358696 0.72282609 0.74456522 0.81929348 0.76630435 0.71059783 0.74053296 0.76718093 0.68092567] mean value: 0.7423014551192146 key: train_roc_auc value: [0.76605645 0.76361556 0.76801424 0.77482838 0.76079329 0.76933638 0.76803966 0.77824229 0.77674741 0.77042161] mean value: 0.7696095259939654 key: test_jcc value: [0.5 0.55555556 0.51612903 0.5483871 0.65517241 0.58064516 0.48275862 0.5483871 0.57142857 0.46875 ] mean value: 0.542721354856366 key: train_jcc value: [0.57518797 0.56470588 0.57364341 0.5862069 0.56877323 0.57692308 0.57992565 0.59160305 0.5917603 0.58455882] mean value: 0.5793288297953626 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01503873 0.01986313 0.02682614 0.02075624 0.02262497 0.02537298 0.01910257 0.02685332 0.02298331 0.02089858] mean value: 0.022031998634338378 key: score_time value: [0.01068616 0.01125503 0.01192212 0.01192665 0.01190472 0.01192307 0.01186848 0.01198268 0.01190519 0.0118885 ] mean value: 0.0117262601852417 key: test_mcc value: [0.54964723 0.49540572 0.51276506 0.43420774 0.73839363 0.61131498 0.36490022 0.64953583 0.61883928 0.60137424] mean value: 0.5576383940609018 key: train_mcc value: [0.68569252 0.50562368 0.69831943 0.61884838 0.67762357 0.75564617 0.64996995 0.76602498 0.70329096 0.65621393] mean value: 0.671725355999401 key: test_accuracy value: [0.78181818 0.67272727 0.70909091 0.67272727 0.87272727 0.8 0.69090909 0.81481481 0.81481481 0.7962963 ] mean value: 0.7625925925925926 key: train_accuracy value: [0.84756098 0.68699187 0.83536585 0.7703252 0.83943089 0.87398374 0.82723577 0.88235294 0.85598377 0.8296146 ] mean value: 0.8248845627401508 key: test_fscore value: [0.7 0.71875 0.73333333 0.7 0.8372093 0.78431373 0.48484848 0.80769231 0.77272727 0.7027027 ] mean value: 0.7241577129119878 key: train_fscore value: [0.80818414 0.72695035 0.83018868 0.78393881 0.77994429 0.86222222 0.76454294 0.86757991 0.82555283 0.76536313] mean value: 0.8014467302533417 key: test_precision value: [0.82352941 0.56097561 0.59459459 0.56756757 0.9 0.71428571 0.8 0.72413793 0.80952381 0.92857143] mean value: 0.7423186067098401 key: train_precision value: [0.85869565 0.57422969 0.73333333 0.64873418 0.92105263 0.79835391 0.8961039 0.82251082 0.84 0.90728477] mean value: 0.8000298882469794 key: test_recall value: [0.60869565 1. 0.95652174 0.91304348 0.7826087 0.86956522 0.34782609 0.91304348 0.73913043 0.56521739] mean value: 0.7695652173913043 key: train_recall value: [0.76328502 0.99033816 0.95652174 0.99033816 0.6763285 0.93719807 0.66666667 0.9178744 0.8115942 0.66183575] mean value: 0.8371980676328502 key: test_roc_auc value: [0.75747283 0.71875 0.74388587 0.70652174 0.86005435 0.80978261 0.64266304 0.82748948 0.80504909 0.76647966] mean value: 0.7638148667601683 key: train_roc_auc value: [0.83602848 0.72850242 0.85194508 0.80043224 0.81711162 0.88263412 0.80526316 0.88725888 0.84985305 0.80644235] mean value: 0.826547138343477 key: test_jcc value: [0.53846154 0.56097561 0.57894737 0.53846154 0.72 0.64516129 0.32 0.67741935 0.62962963 0.54166667] mean value: 0.5750722996557813 key: train_jcc value: [0.67811159 0.57103064 0.70967742 0.64465409 0.63926941 0.7578125 0.61883408 0.76612903 0.70292887 0.6199095 ] mean value: 0.6708357127980087 MCC on Blind test: 0.37 Accuracy on Blind test: 0.6 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02388692 0.02631426 0.02282667 0.0212965 0.02375722 0.02388358 0.02595091 0.02372169 0.02637124 0.02537918] mean value: 0.024338817596435545 key: score_time value: [0.01199365 0.01192498 0.01190543 0.01196766 0.01190972 0.01190233 0.01212358 0.01190972 0.01263452 0.01318789] mean value: 0.01214594841003418 key: test_mcc value: [0.46046933 0.42210145 0.47166751 0.62311394 0.48270989 0.22282609 0.38208785 0.63798041 0.61030357 0.50530306] mean value: 0.4818563094696725 key: train_mcc value: [0.66191422 0.55427156 0.5437853 0.48759994 0.59774701 0.40770208 0.68941608 0.76921545 0.6461682 0.72594029] mean value: 0.6083760151782537 key: test_accuracy value: [0.72727273 0.70909091 0.65454545 0.8 0.74545455 0.63636364 0.69090909 0.7962963 0.77777778 0.75925926] mean value: 0.7296969696969697 key: train_accuracy value: [0.83130081 0.76626016 0.71544715 0.73170732 0.79065041 0.68699187 0.84552846 0.87626775 0.79513185 0.86409736] mean value: 0.7903383136265439 key: test_fscore value: [0.54545455 0.5 0.70769231 0.68571429 0.61111111 0.375 0.4516129 0.8 0.78571429 0.71111111] mean value: 0.6173410550023453 key: train_fscore value: [0.76619718 0.62295082 0.74545455 0.54166667 0.67711599 0.40769231 0.79005525 0.86825054 0.8 0.84454756] mean value: 0.706393086242575 key: test_precision value: [0.9 0.88888889 0.54761905 1. 0.84615385 0.66666667 0.875 0.6875 0.66666667 0.72727273] mean value: 0.7805767843267843 key: train_precision value: [0.91891892 0.96938776 0.59766764 0.96296296 0.96428571 1. 0.92258065 0.78515625 0.67785235 0.8125 ] mean value: 0.861131223390818 key: test_recall value: [0.39130435 0.34782609 1. 0.52173913 0.47826087 0.26086957 0.30434783 0.95652174 0.95652174 0.69565217] mean value: 0.591304347826087 key: train_recall value: [0.65700483 0.4589372 0.99033816 0.37681159 0.52173913 0.25603865 0.69082126 0.97101449 0.97584541 0.87922705] mean value: 0.6777777777777778 key: test_roc_auc value: [0.68002717 0.65828804 0.703125 0.76086957 0.70788043 0.58355978 0.63654891 0.81697055 0.80084151 0.75105189] mean value: 0.7099162868162693 key: train_roc_auc value: [0.80744978 0.72420544 0.75306382 0.68314264 0.75385202 0.62801932 0.824358 0.8893534 0.82009054 0.86618695] mean value: 0.774972191551139 key: test_jcc value: [0.375 0.33333333 0.54761905 0.52173913 0.44 0.23076923 0.29166667 0.66666667 0.64705882 0.55172414] mean value: 0.4605577036950174 key: train_jcc value: [0.62100457 0.45238095 0.5942029 0.37142857 0.51184834 0.25603865 0.65296804 0.76717557 0.66666667 0.73092369] mean value: 0.5624637947640064 MCC on Blind test: 0.45 Accuracy on Blind test: 0.69 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.19745541 0.18153834 0.18095517 0.18258262 0.18196273 0.18145657 0.18402338 0.20013261 0.21202493 0.21415091] mean value: 0.19162826538085936 key: score_time value: [0.01559019 0.0153532 0.01577616 0.01554918 0.01544118 0.01554728 0.01582265 0.01741314 0.01674986 0.01804852] mean value: 0.016129136085510254 key: test_mcc value: [0.70187922 0.82153646 0.78065376 0.64214885 0.78961518 0.74770557 0.66559476 0.6970547 0.72464276 0.82092207] mean value: 0.7391753321463528 key: train_mcc value: [0.94597304 0.93750179 0.95410967 0.95022532 0.96671882 0.94996186 0.95022532 0.95845152 0.94591604 0.9460534 ] mean value: 0.9505136775538949 key: test_accuracy value: [0.85454545 0.90909091 0.89090909 0.81818182 0.89090909 0.87272727 0.83636364 0.85185185 0.85185185 0.90740741] mean value: 0.8683838383838384 key: train_accuracy value: [0.97357724 0.9695122 0.97764228 0.97560976 0.98373984 0.97560976 0.97560976 0.97971602 0.97363083 0.97363083] mean value: 0.9758278500634905 key: test_fscore value: [0.80952381 0.89795918 0.875 0.8 0.88 0.85714286 0.7804878 0.82608696 0.84615385 0.87804878] mean value: 0.8450403238381575 key: train_fscore value: [0.96882494 0.96385542 0.97336562 0.97129187 0.98076923 0.97101449 0.97129187 0.97596154 0.9686747 0.96882494] mean value: 0.9713874612053074 key: test_precision value: [0.89473684 0.84615385 0.84 0.74074074 0.81481481 0.80769231 0.88888889 0.82608696 0.75862069 1. ] mean value: 0.8417735086572773 key: train_precision value: [0.96190476 0.96153846 0.97572816 0.96208531 0.97607656 0.97101449 0.96208531 0.97129187 0.96634615 0.96190476] mean value: 0.9669975824453944 key: test_recall value: [0.73913043 0.95652174 0.91304348 0.86956522 0.95652174 0.91304348 0.69565217 0.82608696 0.95652174 0.7826087 ] mean value: 0.8608695652173913 key: train_recall value: [0.97584541 0.96618357 0.97101449 0.98067633 0.98550725 0.97101449 0.98067633 0.98067633 0.97101449 0.97584541] mean value: 0.9758454106280193 key: test_roc_auc value: [0.83831522 0.91576087 0.89402174 0.82540761 0.90013587 0.87839674 0.81657609 0.84852735 0.86535764 0.89130435] mean value: 0.8673803471248247 key: train_roc_auc value: [0.97388762 0.9690567 0.97673532 0.97630308 0.98398169 0.97498093 0.97630308 0.97984865 0.97326948 0.97393669] mean value: 0.975830324011102 key: test_jcc value: [0.68 0.81481481 0.77777778 0.66666667 0.78571429 0.75 0.64 0.7037037 0.73333333 0.7826087 ] mean value: 0.7334619277662756 key: train_jcc value: [0.93953488 0.93023256 0.94811321 0.94418605 0.96226415 0.94366197 0.94418605 0.95305164 0.93925234 0.93953488] mean value: 0.9444017728567289 MCC on Blind test: 0.67 Accuracy on Blind test: 0.84 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.07939768 0.05831194 0.07030249 0.07544374 0.10521579 0.0780127 0.09002948 0.08250356 0.08563209 0.09377265] mean value: 0.08186221122741699 key: score_time value: [0.02362895 0.01924038 0.02084756 0.02889061 0.01976109 0.01948786 0.03774977 0.02285075 0.03961301 0.02583504] mean value: 0.02579050064086914 key: test_mcc value: [0.66176788 0.92870878 0.74770557 0.74055136 0.70662625 0.70662625 0.81260451 0.62011507 0.78645618 0.890415 ] mean value: 0.7601576842240911 key: train_mcc value: [0.97918627 0.97080023 0.98332062 0.96681345 0.98749079 0.98749079 0.97505096 0.98334516 0.98340136 0.9875091 ] mean value: 0.9804408715224007 key: test_accuracy value: [0.83636364 0.96363636 0.87272727 0.87272727 0.85454545 0.85454545 0.90909091 0.81481481 0.88888889 0.94444444] mean value: 0.8811784511784512 key: train_accuracy value: [0.9898374 0.98577236 0.99186992 0.98373984 0.99390244 0.99390244 0.98780488 0.99188641 0.99188641 0.99391481] mean value: 0.9904516895067531 key: test_fscore value: [0.79069767 0.95833333 0.85714286 0.85106383 0.83333333 0.83333333 0.88888889 0.76190476 0.88 0.93023256] mean value: 0.8584930570281881 key: train_fscore value: [0.98783455 0.98305085 0.99033816 0.98039216 0.99273608 0.99273608 0.98536585 0.99033816 0.99038462 0.99273608] mean value: 0.9885912584189805 key: test_precision value: [0.85 0.92 0.80769231 0.83333333 0.8 0.8 0.90909091 0.84210526 0.81481481 1. ] mean value: 0.857703662808926 key: train_precision value: [0.99509804 0.98543689 0.99033816 0.99502488 0.99514563 0.99514563 0.99507389 0.99033816 0.98564593 0.99514563] mean value: 0.9922392854387729 key: test_recall value: [0.73913043 1. 0.91304348 0.86956522 0.86956522 0.86956522 0.86956522 0.69565217 0.95652174 0.86956522] mean value: 0.8652173913043478 key: train_recall value: [0.98067633 0.98067633 0.99033816 0.96618357 0.99033816 0.99033816 0.97584541 0.99033816 0.99516908 0.99033816] mean value: 0.985024154589372 key: test_roc_auc value: [0.82269022 0.96875 0.87839674 0.87228261 0.85665761 0.85665761 0.90353261 0.79943899 0.89761571 0.93478261] mean value: 0.8790804698457223 key: train_roc_auc value: [0.98858378 0.98507501 0.99166031 0.9813374 0.9934147 0.9934147 0.98616832 0.99167258 0.99233979 0.99342083] mean value: 0.9897087402808227 key: test_jcc value: [0.65384615 0.92 0.75 0.74074074 0.71428571 0.71428571 0.8 0.61538462 0.78571429 0.86956522] mean value: 0.7563822441648529 key: train_jcc value: [0.97596154 0.96666667 0.98086124 0.96153846 0.98557692 0.98557692 0.97115385 0.98086124 0.98095238 0.98557692] mean value: 0.977472615104194 MCC on Blind test: 0.7 Accuracy on Blind test: 0.85 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.16095352 0.21311998 0.11064672 0.18200755 0.18301582 0.18330669 0.18382049 0.18628192 0.18736124 0.20192957] mean value: 0.17924435138702394 key: score_time value: [0.01886606 0.02653193 0.0213213 0.0254035 0.02533579 0.02538133 0.02534842 0.02519035 0.02613616 0.02551413] mean value: 0.024502897262573244 key: test_mcc value: [0.39011901 0.51163988 0.24416604 0.44324972 0.43189061 0.4105162 0.43540317 0.10592543 0.30642689 0.38376294] mean value: 0.3663099878731091 key: train_mcc value: [0.96671882 0.94992874 0.96671882 0.9708388 0.94996186 0.95410967 0.96245495 0.96255992 0.95000174 0.96255992] mean value: 0.9595853232398288 key: test_accuracy value: [0.70909091 0.76363636 0.63636364 0.72727273 0.72727273 0.70909091 0.72727273 0.57407407 0.66666667 0.7037037 ] mean value: 0.6944444444444444 key: train_accuracy value: [0.98373984 0.97560976 0.98373984 0.98577236 0.97560976 0.97764228 0.98170732 0.98174442 0.97565923 0.98174442] mean value: 0.9802969211233694 key: test_fscore value: [0.6 0.71111111 0.54545455 0.68085106 0.65116279 0.66666667 0.59459459 0.43902439 0.57142857 0.6 ] mean value: 0.6060293734026854 key: train_fscore value: [0.98076923 0.97087379 0.98076923 0.98313253 0.97101449 0.97336562 0.97820823 0.97831325 0.97087379 0.97831325] mean value: 0.9765633413131132 key: test_precision value: [0.70588235 0.72727273 0.57142857 0.66666667 0.7 0.64 0.78571429 0.5 0.63157895 0.70588235] mean value: 0.6634425904333026 key: train_precision value: [0.97607656 0.97560976 0.97607656 0.98076923 0.97101449 0.97572816 0.98058252 0.97596154 0.97560976 0.97596154] mean value: 0.976339010230055 key: test_recall value: [0.52173913 0.69565217 0.52173913 0.69565217 0.60869565 0.69565217 0.47826087 0.39130435 0.52173913 0.52173913] mean value: 0.5652173913043478 key: train_recall value: [0.98550725 0.96618357 0.98550725 0.98550725 0.97101449 0.97101449 0.97584541 0.98067633 0.96618357 0.98067633] mean value: 0.9768115942028985 key: test_roc_auc value: [0.68274457 0.75407609 0.62024457 0.72282609 0.71059783 0.70720109 0.69225543 0.55049088 0.64796634 0.6802244 ] mean value: 0.6768627279102384 key: train_roc_auc value: [0.98398169 0.97431986 0.98398169 0.98573608 0.97498093 0.97673532 0.98090516 0.98159691 0.97435053 0.98159691] mean value: 0.9798185071983698 key: test_jcc value: [0.42857143 0.55172414 0.375 0.51612903 0.48275862 0.5 0.42307692 0.28125 0.4 0.42857143] mean value: 0.43870815710985345 key: train_jcc value: [0.96226415 0.94339623 0.96226415 0.96682464 0.94366197 0.94811321 0.95734597 0.95754717 0.94339623 0.95754717] mean value: 0.9542360889831523 MCC on Blind test: 0.32 Accuracy on Blind test: 0.67 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.75554276 0.74635649 0.74191046 0.74311304 0.73715711 0.75417137 0.74482107 0.73726082 0.75659513 0.74458671] mean value: 0.7461514949798584 key: score_time value: [0.00954223 0.0093863 0.00966144 0.00947142 0.00959969 0.01022458 0.00977397 0.00938845 0.00977755 0.00960946] mean value: 0.00964350700378418 key: test_mcc value: [0.66559476 0.85468127 0.82153646 0.78961518 0.74770557 0.82153646 0.77526165 0.5802059 0.68012012 0.890415 ] mean value: 0.762667238203967 key: train_mcc value: [0.99584156 0.99166031 0.99583607 0.99583607 0.99166031 1. 0.99168385 1. 1. 1. ] mean value: 0.9962518155398403 key: test_accuracy value: [0.83636364 0.92727273 0.90909091 0.89090909 0.87272727 0.90909091 0.89090909 0.7962963 0.83333333 0.94444444] mean value: 0.8810437710437711 key: train_accuracy value: [0.99796748 0.99593496 0.99796748 0.99796748 0.99593496 1. 0.99593496 1. 1. 1. ] mean value: 0.9981707317073171 key: test_fscore value: [0.7804878 0.91666667 0.89795918 0.88 0.85714286 0.89795918 0.86363636 0.74418605 0.82352941 0.93023256] mean value: 0.8591800076086744 key: train_fscore value: [0.99759036 0.99516908 0.99757869 0.99757869 0.99516908 1. 0.99514563 1. 1. 1. ] mean value: 0.9978231541752846 key: test_precision value: [0.88888889 0.88 0.84615385 0.81481481 0.80769231 0.84615385 0.9047619 0.8 0.75 1. ] mean value: 0.8538465608465609 key: train_precision value: [0.99519231 0.99516908 1. 1. 0.99516908 1. 1. 1. 1. 1. ] mean value: 0.9985530471943516 key: test_recall value: [0.69565217 0.95652174 0.95652174 0.95652174 0.91304348 0.95652174 0.82608696 0.69565217 0.91304348 0.86956522] mean value: 0.8739130434782608 key: train_recall value: [1. 0.99516908 0.99516908 0.99516908 0.99516908 1. 0.99033816 1. 1. 1. ] mean value: 0.9971014492753623 key: test_roc_auc value: [0.81657609 0.93138587 0.91576087 0.90013587 0.87839674 0.91576087 0.88179348 0.78330996 0.84361851 0.93478261] mean value: 0.8801520862552594 key: train_roc_auc value: [0.99824561 0.99583016 0.99758454 0.99758454 0.99583016 1. 0.99516908 1. 1. 1. ] mean value: 0.9980244088482074 key: test_jcc value: [0.64 0.84615385 0.81481481 0.78571429 0.75 0.81481481 0.76 0.59259259 0.7 0.86956522] mean value: 0.7573655571481658 key: train_jcc value: [0.99519231 0.99038462 0.99516908 0.99516908 0.99038462 1. 0.99033816 1. 1. 1. ] mean value: 0.9956637866963954 MCC on Blind test: 0.73 Accuracy on Blind test: 0.87 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.0453124 0.03122139 0.03177643 0.03254819 0.03172731 0.03109336 0.03182578 0.03118396 0.0318737 0.03185582] mean value: 0.03304183483123779 key: score_time value: [0.01310349 0.01280475 0.01356483 0.01508045 0.01738858 0.01533246 0.01550055 0.01521564 0.01526809 0.01532745] mean value: 0.014858627319335937 key: test_mcc value: [0.13987572 0.07608696 0.00162269 0.26809513 0.32375563 0.34977196 0.24522056 0.0837414 0.28025234 0.10942918] mean value: 0.18778515831908477 key: train_mcc value: [0.32061883 0.32955723 0.33251059 0.30850027 0.30543128 0.30543128 0.32061883 0.32267058 0.30454779 0.31065012] mean value: 0.31605367928568373 key: test_accuracy value: [0.47272727 0.47272727 0.45454545 0.50909091 0.54545455 0.56363636 0.52727273 0.48148148 0.57407407 0.48148148] mean value: 0.5082491582491582 key: train_accuracy value: [0.54471545 0.55081301 0.55284553 0.53658537 0.53455285 0.53455285 0.54471545 0.54563895 0.53346856 0.53752535] mean value: 0.5415413347845446 key: test_fscore value: [0.60273973 0.57971014 0.54545455 0.63013699 0.64788732 0.65714286 0.62857143 0.58823529 0.64615385 0.6 ] mean value: 0.6126032152640289 key: train_fscore value: [0.64890282 0.6519685 0.65299685 0.64485981 0.64385692 0.64385692 0.64890282 0.64890282 0.64285714 0.64485981] mean value: 0.6471964423706671 key: test_precision value: [0.44 0.43478261 0.41860465 0.46 0.47916667 0.4893617 0.46808511 0.44444444 0.5 0.44680851] mean value: 0.458125369011849 key: train_precision value: [0.48027842 0.48364486 0.48477752 0.47586207 0.47477064 0.47477064 0.48027842 0.48027842 0.47368421 0.47586207] mean value: 0.47842072770598526 key: test_recall value: [0.95652174 0.86956522 0.7826087 1. 1. 1. 0.95652174 0.86956522 0.91304348 0.91304348] mean value: 0.9260869565217391 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.54076087 0.52853261 0.50067935 0.578125 0.609375 0.625 0.58763587 0.5315568 0.61781206 0.5371669 ] mean value: 0.5656644460028051 key: train_roc_auc value: [0.60701754 0.6122807 0.61403509 0.6 0.59824561 0.59824561 0.60701754 0.60839161 0.5979021 0.6013986 ] mean value: 0.6044534412955466 key: test_jcc value: [0.43137255 0.40816327 0.375 0.46 0.47916667 0.4893617 0.45833333 0.41666667 0.47727273 0.42857143] mean value: 0.44239083389642125 key: train_jcc value: [0.48027842 0.48364486 0.48477752 0.47586207 0.47477064 0.47477064 0.48027842 0.48027842 0.47368421 0.47586207] mean value: 0.47842072770598526 MCC on Blind test: 0.04 Accuracy on Blind test: 0.46 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02550864 0.01593661 0.02783704 0.01576757 0.01569366 0.02469158 0.01627493 0.01625252 0.04488945 0.03936982] mean value: 0.024222183227539062 key: score_time value: [0.0141151 0.01227522 0.01901174 0.01219106 0.01244736 0.01307368 0.01247072 0.01240945 0.0189209 0.01890063] mean value: 0.014581584930419922 key: test_mcc value: [0.62436244 0.59190054 0.50741958 0.56841568 0.63259873 0.65508136 0.51203338 0.66901612 0.66155709 0.54503297] mean value: 0.5967417897513163 key: train_mcc value: [0.75934031 0.76482813 0.78384839 0.74766434 0.74663867 0.76868204 0.7969552 0.8060331 0.76963028 0.78896998] mean value: 0.7732590448828639 key: test_accuracy value: [0.81818182 0.8 0.74545455 0.78181818 0.81818182 0.81818182 0.76363636 0.83333333 0.83333333 0.77777778] mean value: 0.798989898989899 key: train_accuracy value: [0.88211382 0.88414634 0.89430894 0.87601626 0.87601626 0.88617886 0.9004065 0.90466531 0.88640974 0.89655172] mean value: 0.8886813766717789 key: test_fscore value: [0.76190476 0.76595745 0.73076923 0.76 0.79166667 0.80769231 0.66666667 0.81632653 0.80851064 0.7 ] mean value: 0.7609494249418262 key: train_fscore value: [0.86190476 0.86588235 0.87559809 0.85579196 0.85441527 0.86792453 0.88361045 0.88888889 0.8685446 0.87885986] mean value: 0.870142076452663 key: test_precision value: [0.84210526 0.75 0.65517241 0.7037037 0.76 0.72413793 0.8125 0.76923077 0.79166667 0.82352941] mean value: 0.7632046159351327 key: train_precision value: [0.84976526 0.8440367 0.86729858 0.83796296 0.84433962 0.84792627 0.86915888 0.87037037 0.84474886 0.86448598] mean value: 0.8540093475179242 key: test_recall value: [0.69565217 0.7826087 0.82608696 0.82608696 0.82608696 0.91304348 0.56521739 0.86956522 0.82608696 0.60869565] mean value: 0.7739130434782608 key: train_recall value: [0.87439614 0.88888889 0.88405797 0.87439614 0.8647343 0.88888889 0.89855072 0.90821256 0.89371981 0.89371981] mean value: 0.8869565217391304 key: test_roc_auc value: [0.80095109 0.79755435 0.75679348 0.78804348 0.81929348 0.83152174 0.7357337 0.83800842 0.83239832 0.75596073] mean value: 0.7956258765778401 key: train_roc_auc value: [0.88105772 0.88479532 0.89290618 0.87579456 0.87447241 0.88654971 0.90015256 0.90515523 0.88741934 0.8961606 ] mean value: 0.8884463629429304 key: test_jcc value: [0.61538462 0.62068966 0.57575758 0.61290323 0.65517241 0.67741935 0.5 0.68965517 0.67857143 0.53846154] mean value: 0.616401498019963 key: train_jcc value: [0.75732218 0.76348548 0.7787234 0.74793388 0.74583333 0.76666667 0.79148936 0.8 0.76763485 0.78389831] mean value: 0.7702987463022138 MCC on Blind test: 0.43 Accuracy on Blind test: 0.73 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.21079087 0.22206306 0.3546133 0.26670313 0.21922231 0.24915075 0.30519938 0.30639625 0.29098701 0.28081584] mean value: 0.2705941915512085 key: score_time value: [0.0122683 0.01256919 0.02083921 0.01405358 0.01921558 0.01912498 0.01912189 0.01893353 0.02347326 0.02534485] mean value: 0.018494439125061036 key: test_mcc value: [0.62436244 0.59190054 0.50741958 0.56841568 0.63259873 0.65508136 0.54964723 0.66901612 0.66155709 0.65775818] mean value: 0.6117756954470762 key: train_mcc value: [0.75934031 0.76482813 0.78384839 0.74766434 0.74663867 0.76868204 0.81356476 0.8060331 0.76963028 0.79247793] mean value: 0.7752707961049518 key: test_accuracy value: [0.81818182 0.8 0.74545455 0.78181818 0.81818182 0.81818182 0.78181818 0.83333333 0.83333333 0.83333333] mean value: 0.8063636363636364 key: train_accuracy value: [0.88211382 0.88414634 0.89430894 0.87601626 0.87601626 0.88617886 0.90853659 0.90466531 0.88640974 0.89858012] mean value: 0.8896972245584525 key: test_fscore value: [0.76190476 0.76595745 0.73076923 0.76 0.79166667 0.80769231 0.7 0.81632653 0.80851064 0.79069767] mean value: 0.77335252571702 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_7030.py:115: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_7030.py:118: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.86190476 0.86588235 0.87559809 0.85579196 0.85441527 0.86792453 0.89311164 0.88888889 0.8685446 0.88038278] mean value: 0.8712444869812518 key: test_precision value: [0.84210526 0.75 0.65517241 0.7037037 0.76 0.72413793 0.82352941 0.76923077 0.79166667 0.85 ] mean value: 0.7669546159351326 key: train_precision value: [0.84976526 0.8440367 0.86729858 0.83796296 0.84433962 0.84792627 0.87850467 0.87037037 0.84474886 0.87203791] mean value: 0.8556991202955297 key: test_recall value: [0.69565217 0.7826087 0.82608696 0.82608696 0.82608696 0.91304348 0.60869565 0.86956522 0.82608696 0.73913043] mean value: 0.7913043478260869 key: train_recall value: [0.87439614 0.88888889 0.88405797 0.87439614 0.8647343 0.88888889 0.90821256 0.90821256 0.89371981 0.88888889] mean value: 0.8874396135265701 key: test_roc_auc value: [0.80095109 0.79755435 0.75679348 0.78804348 0.81929348 0.83152174 0.75747283 0.83800842 0.83239832 0.82117812] mean value: 0.8043215287517531 key: train_roc_auc value: [0.88105772 0.88479532 0.89290618 0.87579456 0.87447241 0.88654971 0.90849225 0.90515523 0.88741934 0.89724165] mean value: 0.889388436379283 key: test_jcc value: [0.61538462 0.62068966 0.57575758 0.61290323 0.65517241 0.67741935 0.53846154 0.68965517 0.67857143 0.65384615] mean value: 0.6317861134045784 key: train_jcc value: [0.75732218 0.76348548 0.7787234 0.74793388 0.74583333 0.76666667 0.80686695 0.8 0.76763485 0.78632479] mean value: 0.7720791535349751 MCC on Blind test: 0.41 Accuracy on Blind test: 0.71 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.0372808 0.03722477 0.04446268 0.03734994 0.03764439 0.03813338 0.03796172 0.0386374 0.03517652 0.03762078] mean value: 0.038149237632751465 key: score_time value: [0.01614666 0.02019405 0.01651931 0.01990747 0.01505542 0.01490974 0.01517749 0.01496434 0.01227784 0.0151968 ] mean value: 0.016034913063049317 key: test_mcc value: [0.625 0.62622429 0.53150959 0.8125 0.57258185 0.68245968 0.61982085 0.41661348 0.61445255 0.56449867] mean value: 0.606566095735174 key: train_mcc value: [0.72016767 0.71653529 0.72107594 0.73312189 0.73753515 0.7770742 0.74199614 0.72063201 0.7422532 0.73557441] mean value: 0.7345965893718394 key: test_accuracy value: [0.8125 0.8125 0.765625 0.90625 0.77777778 0.84126984 0.80952381 0.6984127 0.79365079 0.77777778] mean value: 0.7995287698412699 key: train_accuracy value: [0.85964912 0.85789474 0.85964912 0.86491228 0.86865149 0.88791594 0.8704028 0.85989492 0.8704028 0.86690018] mean value: 0.8666273389252466 key: test_fscore value: [0.8125 0.81818182 0.76190476 0.90625 0.80555556 0.84375 0.81818182 0.73239437 0.81690141 0.75 ] mean value: 0.8065619728471841 key: train_fscore value: [0.8630137 0.86106346 0.86440678 0.87102178 0.87001733 0.89078498 0.87372014 0.86348123 0.87457627 0.87162162] mean value: 0.8703707290626053 key: test_precision value: [0.8125 0.79411765 0.77419355 0.90625 0.725 0.84375 0.79411765 0.65 0.725 0.84 ] mean value: 0.7864928842504744 key: train_precision value: [0.84280936 0.84228188 0.83606557 0.83333333 0.85958904 0.86710963 0.85049834 0.84333333 0.84868421 0.84313725] mean value: 0.8466841964126378 key: test_recall value: [0.8125 0.84375 0.75 0.90625 0.90625 0.84375 0.84375 0.83870968 0.93548387 0.67741935] mean value: 0.8357862903225807 key: train_recall value: [0.88421053 0.88070175 0.89473684 0.9122807 0.88070175 0.91578947 0.89824561 0.88461538 0.9020979 0.9020979 ] mean value: 0.8955477855477856 key: test_roc_auc value: [0.8125 0.8125 0.765625 0.90625 0.77570565 0.84122984 0.80897177 0.70060484 0.79586694 0.77620968] mean value: 0.799546370967742 key: train_roc_auc value: [0.85964912 0.85789474 0.85964912 0.86491228 0.86867256 0.88796467 0.87045148 0.85985155 0.8703472 0.86683842] mean value: 0.8666231137283769 key: test_jcc value: [0.68421053 0.69230769 0.61538462 0.82857143 0.6744186 0.72972973 0.69230769 0.57777778 0.69047619 0.6 ] mean value: 0.6785184257522079 key: train_jcc value: [0.75903614 0.7560241 0.76119403 0.77151335 0.76993865 0.80307692 0.77575758 0.75975976 0.77710843 0.77245509] mean value: 0.7705864056386634 MCC on Blind test: 0.47 Accuracy on Blind test: 0.74 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.94220519 1.04799438 0.89575052 0.96442008 0.87544394 1.0187161 0.91170669 1.0358994 0.90370107 0.8602612 ] mean value: 0.9456098556518555 key: score_time value: [0.02014327 0.01540089 0.01545882 0.01552725 0.01543331 0.01552343 0.01533651 0.01330423 0.01524282 0.01570821] mean value: 0.015707874298095705 key: test_mcc value: [0.625 0.7276878 0.62994079 0.72192954 0.55909213 0.68245968 0.62325024 0.74772995 0.61895161 0.68352185] mean value: 0.6619563581137236 key: train_mcc value: [0.85618779 0.85315692 0.88486986 0.89473684 0.89527754 0.81249452 0.84607646 0.89868933 0.90912887 0.88532216] mean value: 0.8735940298646814 key: test_accuracy value: [0.8125 0.859375 0.8125 0.859375 0.77777778 0.84126984 0.80952381 0.87301587 0.80952381 0.84126984] mean value: 0.8296130952380952 key: train_accuracy value: [0.92807018 0.92631579 0.94210526 0.94736842 0.9474606 0.90542907 0.92294221 0.94921191 0.95446585 0.94220665] mean value: 0.9365575936338218 key: test_fscore value: [0.8125 0.86956522 0.8 0.85245902 0.79411765 0.84375 0.82352941 0.875 0.80645161 0.83333333] mean value: 0.8310706238844835 key: train_fscore value: [0.92844677 0.92758621 0.94320138 0.94736842 0.94809689 0.90816327 0.92361111 0.94991364 0.9550173 0.94358974] mean value: 0.9374994727336559 key: test_precision value: [0.8125 0.81081081 0.85714286 0.89655172 0.75 0.84375 0.77777778 0.84848485 0.80645161 0.86206897] mean value: 0.8265538596774692 key: train_precision value: [0.92361111 0.91186441 0.92567568 0.94736842 0.93515358 0.88118812 0.91408935 0.93856655 0.94520548 0.92307692] mean value: 0.9245799619557747 key: test_recall value: [0.8125 0.9375 0.75 0.8125 0.84375 0.84375 0.875 0.90322581 0.80645161 0.80645161] mean value: 0.8391129032258065 key: train_recall value: [0.93333333 0.94385965 0.96140351 0.94736842 0.96140351 0.93684211 0.93333333 0.96153846 0.96503497 0.96503497] mean value: 0.9509152251257514 key: test_roc_auc value: [0.8125 0.859375 0.8125 0.859375 0.77671371 0.84122984 0.80846774 0.8734879 0.80947581 0.84072581] mean value: 0.8293850806451613 key: train_roc_auc value: [0.92807018 0.92631579 0.94210526 0.94736842 0.94748497 0.90548399 0.92296037 0.94919028 0.95444731 0.94216661] mean value: 0.9365593178751074 key: test_jcc value: [0.68421053 0.76923077 0.66666667 0.74285714 0.65853659 0.72972973 0.7 0.77777778 0.67567568 0.71428571] mean value: 0.7118970587905119 key: train_jcc value: [0.86644951 0.86495177 0.89250814 0.9 0.90131579 0.8317757 0.85806452 0.90460526 0.91390728 0.89320388] mean value: 0.8826781861170421 MCC on Blind test: 0.49 Accuracy on Blind test: 0.75 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01452422 0.01175213 0.01055312 0.01048565 0.01047111 0.01041317 0.01043606 0.01045132 0.01038289 0.01034546] mean value: 0.010981512069702149 key: score_time value: [0.01245642 0.0096271 0.00930309 0.0090971 0.00905991 0.00908732 0.00912166 0.00913 0.00914192 0.00923848] mean value: 0.009526300430297851 key: test_mcc value: [0.60848698 0.62622429 0.56694671 0.59404013 0.60087592 0.46010298 0.47384924 0.31444802 0.43812738 0.50663549] mean value: 0.5189737164571981 key: train_mcc value: [0.56630842 0.56060684 0.5176281 0.56727781 0.56810211 0.51641866 0.50779222 0.62720405 0.57097112 0.64020763] mean value: 0.5642516961897099 key: test_accuracy value: [0.796875 0.8125 0.78125 0.796875 0.79365079 0.71428571 0.73015873 0.65079365 0.71428571 0.74603175] mean value: 0.7536706349206349 key: train_accuracy value: [0.78245614 0.77894737 0.75789474 0.78245614 0.78283713 0.74430823 0.74956217 0.81260946 0.78458844 0.8178634 ] mean value: 0.7793523212584877 key: test_fscore value: [0.81690141 0.81818182 0.76666667 0.79365079 0.81690141 0.76315789 0.70175439 0.68571429 0.73529412 0.7037037 ] mean value: 0.7601926483167489 key: train_fscore value: [0.78983051 0.78929766 0.76767677 0.79194631 0.79194631 0.77945619 0.72340426 0.82016807 0.79327731 0.82838284] mean value: 0.7875386217571597 key: test_precision value: [0.74358974 0.79411765 0.82142857 0.80645161 0.74358974 0.65909091 0.8 0.61538462 0.67567568 0.82608696] mean value: 0.7485415475243047 key: train_precision value: [0.76393443 0.75399361 0.73786408 0.75884244 0.75884244 0.68435013 0.80603448 0.78964401 0.76375405 0.784375 ] mean value: 0.7601634675219903 key: test_recall value: [0.90625 0.84375 0.71875 0.78125 0.90625 0.90625 0.625 0.77419355 0.80645161 0.61290323] mean value: 0.7881048387096774 key: train_recall value: [0.81754386 0.82807018 0.8 0.82807018 0.82807018 0.90526316 0.65614035 0.85314685 0.82517483 0.87762238] mean value: 0.8219101950680898 key: test_roc_auc value: [0.796875 0.8125 0.78125 0.796875 0.79183468 0.71118952 0.73185484 0.65272177 0.71572581 0.74395161] mean value: 0.7534778225806451 key: train_roc_auc value: [0.78245614 0.77894737 0.75789474 0.78245614 0.78291621 0.74458962 0.74939885 0.81253834 0.78451724 0.81775856] mean value: 0.7793473193473194 key: test_jcc value: [0.69047619 0.69230769 0.62162162 0.65789474 0.69047619 0.61702128 0.54054054 0.52173913 0.58139535 0.54285714] mean value: 0.615632987098922 key: train_jcc value: [0.65266106 0.6519337 0.62295082 0.65555556 0.65555556 0.63861386 0.56666667 0.6951567 0.65738162 0.70704225] mean value: 0.6503517789195984 MCC on Blind test: 0.42 Accuracy on Blind test: 0.71 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01082611 0.01066971 0.01071644 0.01078057 0.01081586 0.01075149 0.01079583 0.01081753 0.01080179 0.01105452] mean value: 0.010802984237670898 key: score_time value: [0.00914764 0.00914502 0.00916004 0.00921893 0.00907278 0.00916004 0.00923276 0.00920725 0.00922751 0.0093658 ] mean value: 0.009193778038024902 key: test_mcc value: [0.5625 0.438357 0.2214702 0.56360186 0.53549564 0.61895161 0.49493401 0.34405576 0.48255984 0.4307759 ] mean value: 0.46927018296024287 key: train_mcc value: [0.53512128 0.56101149 0.55927353 0.54447222 0.5466585 0.53152779 0.56477321 0.53613782 0.54097122 0.54351524] mean value: 0.5463462303428195 key: test_accuracy value: [0.78125 0.71875 0.609375 0.78125 0.76190476 0.80952381 0.74603175 0.66666667 0.73015873 0.71428571] mean value: 0.7319196428571428 key: train_accuracy value: [0.76491228 0.77894737 0.77894737 0.77017544 0.77232925 0.76532399 0.78108581 0.76707531 0.76882662 0.77057793] mean value: 0.771820137032599 key: test_fscore value: [0.78125 0.72727273 0.57627119 0.78787879 0.78873239 0.8125 0.76470588 0.69565217 0.76056338 0.68965517] mean value: 0.7384481704919857 key: train_fscore value: [0.78032787 0.79 0.78644068 0.78347107 0.78114478 0.77133106 0.79061977 0.77721943 0.78145695 0.78130217] mean value: 0.7823313780270075 key: test_precision value: [0.78125 0.70588235 0.62962963 0.76470588 0.71794872 0.8125 0.72222222 0.63157895 0.675 0.74074074] mean value: 0.7181458493203849 key: train_precision value: [0.73230769 0.75238095 0.76065574 0.740625 0.75080906 0.75083056 0.75641026 0.74598071 0.74213836 0.74760383] mean value: 0.7479742171117733 key: test_recall value: [0.78125 0.75 0.53125 0.8125 0.875 0.8125 0.8125 0.77419355 0.87096774 0.64516129] mean value: 0.7665322580645161 key: train_recall value: [0.83508772 0.83157895 0.81403509 0.83157895 0.81403509 0.79298246 0.82807018 0.81118881 0.82517483 0.81818182] mean value: 0.8201913875598086 key: test_roc_auc value: [0.78125 0.71875 0.609375 0.78125 0.76008065 0.80947581 0.74495968 0.66834677 0.73235887 0.71320565] mean value: 0.7319052419354839 key: train_roc_auc value: [0.76491228 0.77894737 0.77894737 0.77017544 0.77240216 0.76537235 0.78116795 0.76699791 0.76872776 0.77049442] mean value: 0.7718145012881855 key: test_jcc value: [0.64102564 0.57142857 0.4047619 0.65 0.65116279 0.68421053 0.61904762 0.53333333 0.61363636 0.52631579] mean value: 0.5894922539720582 key: train_jcc value: [0.63978495 0.65289256 0.64804469 0.64402174 0.64088398 0.62777778 0.65373961 0.63561644 0.64130435 0.64109589] mean value: 0.6425161984547801 MCC on Blind test: 0.36 Accuracy on Blind test: 0.68 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00977778 0.01034784 0.00981355 0.01110673 0.01137733 0.01119494 0.01055646 0.01125455 0.01110768 0.01122093] mean value: 0.01077578067779541 key: score_time value: [0.01279974 0.01619864 0.01500916 0.01399326 0.01432085 0.01528668 0.0143702 0.01536059 0.01473594 0.01460624] mean value: 0.014668130874633789 key: test_mcc value: [0.51639778 0.50097943 0.25819889 0.4375 0.36661779 0.39757328 0.40025188 0.34405576 0.33021346 0.56710881] mean value: 0.4118897086766542 key: train_mcc value: [0.64338976 0.61922715 0.63817508 0.64068622 0.66915977 0.66656936 0.65172079 0.60720621 0.65801299 0.62989052] mean value: 0.6424037857751423 key: test_accuracy value: [0.75 0.75 0.625 0.71875 0.68253968 0.6984127 0.6984127 0.66666667 0.65079365 0.77777778] mean value: 0.7018353174603175 key: train_accuracy value: [0.82105263 0.80877193 0.81754386 0.81929825 0.83187391 0.83187391 0.82486865 0.80035026 0.82837128 0.81260946] mean value: 0.8196614127262113 key: test_fscore value: [0.77777778 0.74193548 0.66666667 0.71875 0.70588235 0.71641791 0.72463768 0.69565217 0.7027027 0.79411765] mean value: 0.7244540396538339 key: train_fscore value: [0.82653061 0.81556684 0.82608696 0.82630691 0.84158416 0.83892617 0.83108108 0.81433225 0.83389831 0.82372323] mean value: 0.8278036514265043 key: test_precision value: [0.7 0.76666667 0.6 0.71875 0.66666667 0.68571429 0.67567568 0.63157895 0.60465116 0.72972973] mean value: 0.6779433134612143 key: train_precision value: [0.8019802 0.7875817 0.78913738 0.79545455 0.79439252 0.80385852 0.80130293 0.76219512 0.80921053 0.7788162 ] mean value: 0.79239296465173 key: test_recall value: [0.875 0.71875 0.75 0.71875 0.75 0.75 0.78125 0.77419355 0.83870968 0.87096774] mean value: 0.7827620967741935 key: train_recall value: [0.85263158 0.84561404 0.86666667 0.85964912 0.89473684 0.87719298 0.86315789 0.87412587 0.86013986 0.87412587] mean value: 0.8668040731198626 key: test_roc_auc value: [0.75 0.75 0.625 0.71875 0.68145161 0.69758065 0.69707661 0.66834677 0.65372984 0.77923387] mean value: 0.702116935483871 key: train_roc_auc value: [0.82105263 0.80877193 0.81754386 0.81929825 0.83198381 0.83195313 0.82493559 0.80022083 0.82831554 0.81250153] mean value: 0.8196577107103422 key: test_jcc value: [0.63636364 0.58974359 0.5 0.56097561 0.54545455 0.55813953 0.56818182 0.53333333 0.54166667 0.65853659] mean value: 0.5692395319749262 key: train_jcc value: [0.70434783 0.68857143 0.7037037 0.70402299 0.72649573 0.72254335 0.71098266 0.68681319 0.71511628 0.70028011] mean value: 0.7062877262852029 MCC on Blind test: 0.26 Accuracy on Blind test: 0.64 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.03306842 0.03381634 0.02899456 0.02887368 0.02807379 0.03325653 0.03162026 0.03322268 0.03296661 0.03240705] mean value: 0.03162999153137207 key: score_time value: [0.01416421 0.01412463 0.01416826 0.01375866 0.01350045 0.01424146 0.01417565 0.01405978 0.01455617 0.01422095] mean value: 0.014097023010253906 key: test_mcc value: [0.65915306 0.59637658 0.59404013 0.69293487 0.54443762 0.65821474 0.65085805 0.41661348 0.58778119 0.62939541] mean value: 0.6029805129541226 key: train_mcc value: [0.72279499 0.71035225 0.7082164 0.71033605 0.72568869 0.72529147 0.71791058 0.72274854 0.72754221 0.73258386] mean value: 0.7203465041191958 key: test_accuracy value: [0.828125 0.796875 0.796875 0.84375 0.76190476 0.82539683 0.82539683 0.6984127 0.77777778 0.80952381] mean value: 0.7964037698412698 key: train_accuracy value: [0.85964912 0.85438596 0.85263158 0.85263158 0.86164623 0.86164623 0.85814361 0.85989492 0.86164623 0.86339755] mean value: 0.8585673026699849 key: test_fscore value: [0.8358209 0.80597015 0.79365079 0.85294118 0.79452055 0.84057971 0.83076923 0.73239437 0.80555556 0.78571429] mean value: 0.8077916711223889 key: train_fscore value: [0.86622074 0.85908319 0.8590604 0.86092715 0.86677909 0.86632826 0.86247878 0.86622074 0.86898839 0.87171053] mean value: 0.8647797260273575 key: test_precision value: [0.8 0.77142857 0.80645161 0.80555556 0.70731707 0.78378378 0.81818182 0.65 0.70731707 0.88 ] mean value: 0.7730035488194418 key: train_precision value: [0.82747604 0.83223684 0.82315113 0.81504702 0.83441558 0.83660131 0.83552632 0.83012821 0.82649842 0.82298137] mean value: 0.8284062229484791 key: test_recall value: [0.875 0.84375 0.78125 0.90625 0.90625 0.90625 0.84375 0.83870968 0.93548387 0.70967742] mean value: 0.8546370967741935 key: train_recall value: [0.90877193 0.8877193 0.89824561 0.9122807 0.90175439 0.89824561 0.89122807 0.90559441 0.91608392 0.92657343] mean value: 0.9046497362286836 key: test_roc_auc value: [0.828125 0.796875 0.796875 0.84375 0.75957661 0.82409274 0.82510081 0.70060484 0.78024194 0.80796371] mean value: 0.7963205645161291 key: train_roc_auc value: [0.85964912 0.85438596 0.85263158 0.85263158 0.86171635 0.86171022 0.85820145 0.85981475 0.86155073 0.86328671] mean value: 0.8585578456631089 key: test_jcc value: [0.71794872 0.675 0.65789474 0.74358974 0.65909091 0.725 0.71052632 0.57777778 0.6744186 0.64705882] mean value: 0.6788305629219302 key: train_jcc value: [0.7640118 0.75297619 0.75294118 0.75581395 0.76488095 0.7641791 0.75820896 0.7640118 0.76832845 0.77259475] mean value: 0.7617947129272045 MCC on Blind test: 0.38 Accuracy on Blind test: 0.69 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [2.00646472 2.37337565 0.83091569 1.837955 1.98165274 1.99141598 2.06337285 2.09861183 1.97596526 1.9511888 ] mean value: 1.9110918521881104 key: score_time value: [0.01247263 0.01252747 0.01249647 0.01088095 0.01669049 0.01359892 0.01792812 0.01524568 0.01656628 0.01519775] mean value: 0.014360475540161132 key: test_mcc value: [0.57265629 0.75146915 0.50097943 0.65915306 0.49960192 0.62325024 0.55611985 0.58770161 0.61445255 0.5253647 ] mean value: 0.5890748799761464 key: train_mcc value: [0.97558874 0.98606204 0.84694977 0.94395263 0.97898417 0.96195115 0.95494277 0.97207363 0.965351 0.96862386] mean value: 0.9554479759844114 key: test_accuracy value: [0.78125 0.875 0.75 0.828125 0.74603175 0.80952381 0.77777778 0.79365079 0.79365079 0.76190476] mean value: 0.7916914682539682 key: train_accuracy value: [0.9877193 0.99298246 0.92280702 0.97192982 0.98949212 0.98073555 0.97723292 0.98598949 0.98248687 0.98423818] mean value: 0.9775613727839739 key: test_fscore value: [0.8 0.87878788 0.75757576 0.81967213 0.77142857 0.82352941 0.78787879 0.79365079 0.81690141 0.74576271] mean value: 0.7995187452549147 key: train_fscore value: [0.98782609 0.99303136 0.92491468 0.97212544 0.98947368 0.98100173 0.9775475 0.98611111 0.98275862 0.98440208] mean value: 0.9779192275681451 key: test_precision value: [0.73684211 0.85294118 0.73529412 0.86206897 0.71052632 0.77777778 0.76470588 0.78125 0.725 0.78571429] mean value: 0.7732120626532525 key: train_precision value: [0.97931034 0.98615917 0.90033223 0.96539792 0.98947368 0.96598639 0.96258503 0.97931034 0.96938776 0.97594502] mean value: 0.9673887894060526 key: test_recall value: [0.875 0.90625 0.78125 0.78125 0.84375 0.875 0.8125 0.80645161 0.93548387 0.70967742] mean value: 0.8326612903225806 key: train_recall value: [0.99649123 1. 0.95087719 0.97894737 0.98947368 0.99649123 0.99298246 0.99300699 0.9965035 0.99300699] mean value: 0.9887780640412219 key: test_roc_auc value: [0.78125 0.875 0.75 0.828125 0.74445565 0.80846774 0.77721774 0.79385081 0.79586694 0.76108871] mean value: 0.7915322580645161 key: train_roc_auc value: [0.9877193 0.99298246 0.92280702 0.97192982 0.98949209 0.9807631 0.97726046 0.98597718 0.98246227 0.98422279] mean value: 0.9775616488774383 key: test_jcc value: [0.66666667 0.78378378 0.6097561 0.69444444 0.62790698 0.7 0.65 0.65789474 0.69047619 0.59459459] mean value: 0.6675523491112947 key: train_jcc value: [0.97594502 0.98615917 0.86031746 0.94576271 0.97916667 0.96271186 0.95608108 0.97260274 0.96610169 0.96928328] mean value: 0.9574131682160492 MCC on Blind test: 0.48 Accuracy on Blind test: 0.74 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.04414916 0.0394628 0.0351615 0.03433752 0.04042983 0.03405404 0.03932333 0.03669691 0.03327823 0.04195404] mean value: 0.03788473606109619 key: score_time value: [0.01111007 0.00946021 0.00911212 0.00907183 0.00917482 0.00916386 0.00986719 0.00945878 0.00997353 0.00954318] mean value: 0.009593558311462403 key: test_mcc value: [0.75 0.90669283 0.71910121 0.875 0.65419917 0.58728587 0.65315611 0.61895161 0.53159579 0.84530217] mean value: 0.7141284759557099 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.875 0.953125 0.859375 0.9375 0.82539683 0.79365079 0.82539683 0.80952381 0.76190476 0.92063492] mean value: 0.8561507936507936 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.875 0.95238095 0.85714286 0.9375 0.81967213 0.8 0.8358209 0.80645161 0.7761194 0.92307692] mean value: 0.8583164775158962 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.875 0.96774194 0.87096774 0.9375 0.86206897 0.78787879 0.8 0.80645161 0.72222222 0.88235294] mean value: 0.8512184207117303 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.875 0.9375 0.84375 0.9375 0.78125 0.8125 0.875 0.80645161 0.83870968 0.96774194] mean value: 0.8675403225806452 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.875 0.953125 0.859375 0.9375 0.82610887 0.79334677 0.82459677 0.80947581 0.76310484 0.92137097] mean value: 0.8563004032258065 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.77777778 0.90909091 0.75 0.88235294 0.69444444 0.66666667 0.71794872 0.67567568 0.63414634 0.85714286] mean value: 0.7565246331386933 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.59 Accuracy on Blind test: 0.8 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.15806508 0.14951444 0.15549088 0.15570831 0.15163684 0.15120387 0.14514089 0.14617419 0.14691663 0.14852071] mean value: 0.15083718299865723 key: score_time value: [0.02020621 0.01982594 0.0203526 0.01929188 0.01975894 0.01916385 0.02001977 0.01960254 0.0192945 0.02043915] mean value: 0.019795536994934082 key: test_mcc value: [0.75592895 0.71910121 0.59404013 0.69293487 0.55611985 0.5253647 0.5892604 0.4969666 0.53874599 0.7591889 ] mean value: 0.6227651593905483 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.875 0.859375 0.796875 0.84375 0.77777778 0.76190476 0.79365079 0.74603175 0.76190476 0.87301587] mean value: 0.8089285714285714 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.88235294 0.85714286 0.8 0.83333333 0.78787879 0.7761194 0.80597015 0.75757576 0.7826087 0.85714286] mean value: 0.8140124782141044 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.83333333 0.87096774 0.78787879 0.89285714 0.76470588 0.74285714 0.77142857 0.71428571 0.71052632 0.96 ] mean value: 0.8048840632718591 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.9375 0.84375 0.8125 0.78125 0.8125 0.8125 0.84375 0.80645161 0.87096774 0.77419355] mean value: 0.8295362903225807 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.875 0.859375 0.796875 0.84375 0.77721774 0.76108871 0.79284274 0.74697581 0.76360887 0.87147177] mean value: 0.808820564516129 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.78947368 0.75 0.66666667 0.71428571 0.65 0.63414634 0.675 0.6097561 0.64285714 0.75 ] mean value: 0.6882185647044441 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.41 Accuracy on Blind test: 0.71 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01188111 0.01215482 0.01145577 0.01158643 0.01207089 0.01227689 0.01192045 0.01207066 0.0113132 0.01151323] mean value: 0.011824345588684082 key: score_time value: [0.00942564 0.00984573 0.00936389 0.00920653 0.0097208 0.00978017 0.00969028 0.00946641 0.00935674 0.00909328] mean value: 0.009494948387145995 key: test_mcc value: [0.53150959 0.5 0.53150959 0.51639778 0.23915249 0.55544355 0.40327957 0.29185862 0.39717742 0.42842742] mean value: 0.43947560209106934 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.765625 0.75 0.765625 0.75 0.61904762 0.77777778 0.6984127 0.63492063 0.6984127 0.71428571] mean value: 0.7174107142857142 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.76923077 0.75 0.76923077 0.71428571 0.61290323 0.78125 0.6779661 0.68493151 0.6984127 0.70967742] mean value: 0.7167888204865471 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75757576 0.75 0.75757576 0.83333333 0.63333333 0.78125 0.74074074 0.5952381 0.6875 0.70967742] mean value: 0.7246224437151857 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.78125 0.75 0.78125 0.625 0.59375 0.78125 0.625 0.80645161 0.70967742 0.70967742] mean value: 0.7163306451612903 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.765625 0.75 0.765625 0.75 0.61945565 0.77772177 0.69959677 0.63760081 0.69858871 0.71421371] mean value: 0.7178427419354838 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.625 0.6 0.625 0.55555556 0.44186047 0.64102564 0.51282051 0.52083333 0.53658537 0.55 ] mean value: 0.5608680873704981 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.27 Accuracy on Blind test: 0.64 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [2.21129918 2.18176174 2.17599201 2.15194917 2.18359351 2.18674564 2.23126411 2.19483113 2.19053888 2.27243757] mean value: 2.198041296005249 key: score_time value: [0.10429049 0.09528875 0.09521317 0.09452415 0.1034019 0.09562993 0.09537983 0.09546041 0.10098195 0.10416818] mean value: 0.09843387603759765 key: test_mcc value: [0.84416229 0.90669283 0.84416229 0.8125 0.80947581 0.71443023 0.68352185 0.81130213 0.72407013 0.87462485] mean value: 0.8024942409158446 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.921875 0.953125 0.921875 0.90625 0.9047619 0.85714286 0.84126984 0.9047619 0.85714286 0.93650794] mean value: 0.9004712301587301 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92063492 0.95238095 0.92307692 0.90625 0.90625 0.86153846 0.84848485 0.90625 0.86567164 0.93333333] mean value: 0.9023871081240484 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.93548387 0.96774194 0.90909091 0.90625 0.90625 0.84848485 0.82352941 0.87878788 0.80555556 0.96551724] mean value: 0.8946691651514821 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90625 0.9375 0.9375 0.90625 0.90625 0.875 0.875 0.93548387 0.93548387 0.90322581] mean value: 0.9117943548387096 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.921875 0.953125 0.921875 0.90625 0.9047379 0.85685484 0.84072581 0.90524194 0.85836694 0.9359879 ] mean value: 0.9005040322580645 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.85294118 0.90909091 0.85714286 0.82857143 0.82857143 0.75675676 0.73684211 0.82857143 0.76315789 0.875 ] mean value: 0.8236645985175397 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.67 Accuracy on Blind test: 0.84 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.03612447 1.02883792 1.05346131 1.02845502 1.04689741 1.01136518 1.0421288 1.02813244 1.04972386 1.01992607] mean value: 1.0345052480697632 key: score_time value: [0.22935534 0.23343563 0.23888326 0.12340879 0.27946091 0.26874781 0.24961305 0.19223285 0.27546382 0.29060841] mean value: 0.23812098503112794 key: test_mcc value: [0.84416229 0.90669283 0.8125 0.8125 0.80947581 0.71705182 0.71705182 0.78160117 0.72407013 0.84484323] mean value: 0.7969949084526066 key: train_mcc value: [0.94063464 0.92677801 0.92728096 0.9340293 0.93052245 0.93391766 0.92303445 0.94432589 0.937524 0.92350399] mean value: 0.9321551356771407 key: test_accuracy value: [0.921875 0.953125 0.90625 0.90625 0.9047619 0.85714286 0.85714286 0.88888889 0.85714286 0.92063492] mean value: 0.8973214285714286 key: train_accuracy value: [0.97017544 0.96315789 0.96315789 0.96666667 0.96497373 0.96672504 0.9614711 0.97197898 0.96847636 0.9614711 ] mean value: 0.9658254216978523 key: test_fscore value: [0.92063492 0.95238095 0.90625 0.90625 0.90625 0.86567164 0.86567164 0.89230769 0.86567164 0.91525424] mean value: 0.8996342727984835 key: train_fscore value: [0.97053726 0.96373057 0.96397942 0.96729776 0.96551724 0.9671848 0.96167247 0.97241379 0.96907216 0.96219931] mean value: 0.9663604798329994 key: test_precision value: [0.93548387 0.96774194 0.90625 0.90625 0.90625 0.82857143 0.82857143 0.85294118 0.80555556 0.96428571] mean value: 0.8901901109906328 key: train_precision value: [0.95890411 0.94897959 0.94295302 0.94932432 0.94915254 0.95238095 0.9550173 0.95918367 0.9527027 0.94594595] mean value: 0.9514544163794261 key: test_recall value: [0.90625 0.9375 0.90625 0.90625 0.90625 0.90625 0.90625 0.93548387 0.93548387 0.87096774] mean value: 0.9116935483870967 key: train_recall value: [0.98245614 0.97894737 0.98596491 0.98596491 0.98245614 0.98245614 0.96842105 0.98601399 0.98601399 0.97902098] mean value: 0.9817715617715618 key: test_roc_auc value: [0.921875 0.953125 0.90625 0.90625 0.9047379 0.85635081 0.85635081 0.88961694 0.85836694 0.91985887] mean value: 0.8972782258064516 key: train_roc_auc value: [0.97017544 0.96315789 0.96315789 0.96666667 0.96500429 0.96675255 0.96148325 0.97195436 0.96844559 0.96144031] mean value: 0.9658238252975095 key: test_jcc value: [0.85294118 0.90909091 0.82857143 0.82857143 0.82857143 0.76315789 0.76315789 0.80555556 0.76315789 0.84375 ] mean value: 0.8186525611041865 key: train_jcc value: [0.94276094 0.93 0.93046358 0.93666667 0.93333333 0.93645485 0.9261745 0.94630872 0.94 0.92715232] mean value: 0.9349314907775516 MCC on Blind test: 0.68 Accuracy on Blind test: 0.84 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01233745 0.01187658 0.01076055 0.01092172 0.01135612 0.01156545 0.01084685 0.01140285 0.01177788 0.01220751] mean value: 0.01150529384613037 key: score_time value: [0.01918459 0.00934148 0.00909972 0.00989866 0.0099771 0.00917387 0.00946617 0.01000452 0.00948906 0.01004934] mean value: 0.010568451881408692 key: test_mcc value: [0.5625 0.438357 0.2214702 0.56360186 0.53549564 0.61895161 0.49493401 0.34405576 0.48255984 0.4307759 ] mean value: 0.46927018296024287 key: train_mcc value: [0.53512128 0.56101149 0.55927353 0.54447222 0.5466585 0.53152779 0.56477321 0.53613782 0.54097122 0.54351524] mean value: 0.5463462303428195 key: test_accuracy value: [0.78125 0.71875 0.609375 0.78125 0.76190476 0.80952381 0.74603175 0.66666667 0.73015873 0.71428571] mean value: 0.7319196428571428 key: train_accuracy value: [0.76491228 0.77894737 0.77894737 0.77017544 0.77232925 0.76532399 0.78108581 0.76707531 0.76882662 0.77057793] mean value: 0.771820137032599 key: test_fscore value: [0.78125 0.72727273 0.57627119 0.78787879 0.78873239 0.8125 0.76470588 0.69565217 0.76056338 0.68965517] mean value: 0.7384481704919857 key: train_fscore value: [0.78032787 0.79 0.78644068 0.78347107 0.78114478 0.77133106 0.79061977 0.77721943 0.78145695 0.78130217] mean value: 0.7823313780270075 key: test_precision value: [0.78125 0.70588235 0.62962963 0.76470588 0.71794872 0.8125 0.72222222 0.63157895 0.675 0.74074074] mean value: 0.7181458493203849 key: train_precision value: [0.73230769 0.75238095 0.76065574 0.740625 0.75080906 0.75083056 0.75641026 0.74598071 0.74213836 0.74760383] mean value: 0.7479742171117733 key: test_recall value: [0.78125 0.75 0.53125 0.8125 0.875 0.8125 0.8125 0.77419355 0.87096774 0.64516129] mean value: 0.7665322580645161 key: train_recall value: [0.83508772 0.83157895 0.81403509 0.83157895 0.81403509 0.79298246 0.82807018 0.81118881 0.82517483 0.81818182] mean value: 0.8201913875598086 key: test_roc_auc value: [0.78125 0.71875 0.609375 0.78125 0.76008065 0.80947581 0.74495968 0.66834677 0.73235887 0.71320565] mean value: 0.7319052419354839 key: train_roc_auc value: [0.76491228 0.77894737 0.77894737 0.77017544 0.77240216 0.76537235 0.78116795 0.76699791 0.76872776 0.77049442] mean value: 0.7718145012881855 key: test_jcc value: [0.64102564 0.57142857 0.4047619 0.65 0.65116279 0.68421053 0.61904762 0.53333333 0.61363636 0.52631579] mean value: 0.5894922539720582 key: train_jcc value: [0.63978495 0.65289256 0.64804469 0.64402174 0.64088398 0.62777778 0.65373961 0.63561644 0.64130435 0.64109589] mean value: 0.6425161984547801 MCC on Blind test: 0.36 Accuracy on Blind test: 0.68 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.13100171 0.1108017 0.11466289 0.1149056 0.11569571 0.11424947 0.1059444 0.10726643 0.10647297 0.10821867] mean value: 0.11292195320129395 key: score_time value: [0.01161242 0.01174259 0.01199245 0.01133132 0.012321 0.01231289 0.011199 0.01135135 0.01112986 0.01113558] mean value: 0.011612844467163087 key: test_mcc value: [0.84416229 0.9375 0.84416229 0.8125 0.90524194 0.71705182 0.71443023 0.81130213 0.72407013 0.96875 ] mean value: 0.8279170821280594 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.921875 0.96875 0.921875 0.90625 0.95238095 0.85714286 0.85714286 0.9047619 0.85714286 0.98412698] mean value: 0.9131448412698413 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92307692 0.96875 0.92063492 0.90625 0.95238095 0.86567164 0.86153846 0.90625 0.86567164 0.98412698] mean value: 0.9154351525340331 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.90909091 0.96875 0.93548387 0.90625 0.96774194 0.82857143 0.84848485 0.87878788 0.80555556 0.96875 ] mean value: 0.9017466426942233 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.9375 0.96875 0.90625 0.90625 0.9375 0.90625 0.875 0.93548387 0.93548387 1. ] mean value: 0.9308467741935483 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.921875 0.96875 0.921875 0.90625 0.95262097 0.85635081 0.85685484 0.90524194 0.85836694 0.984375 ] mean value: 0.9132560483870967 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.85714286 0.93939394 0.85294118 0.82857143 0.90909091 0.76315789 0.75675676 0.82857143 0.76315789 0.96875 ] mean value: 0.8467534285471592 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.73 Accuracy on Blind test: 0.87 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04637885 0.08834767 0.06918025 0.05507755 0.08053446 0.07429886 0.04650426 0.07581925 0.07261181 0.05177426] mean value: 0.0660527229309082 key: score_time value: [0.02093959 0.01971197 0.01240396 0.01932096 0.01950669 0.01235509 0.01232791 0.01954722 0.01231313 0.01326156] mean value: 0.016168808937072753 key: test_mcc value: [0.56360186 0.64549722 0.40644851 0.76354172 0.69429215 0.61982085 0.71705182 0.60364273 0.57596915 0.58728587] mean value: 0.6177151884770942 key: train_mcc value: [0.84327404 0.82219219 0.84694977 0.84327404 0.85002782 0.85365432 0.82953205 0.82950084 0.8372133 0.82614956] mean value: 0.8381767933116789 key: test_accuracy value: [0.78125 0.8125 0.703125 0.875 0.84126984 0.80952381 0.85714286 0.79365079 0.77777778 0.79365079] mean value: 0.8044890873015873 key: train_accuracy value: [0.92105263 0.91052632 0.92280702 0.92105263 0.92469352 0.92644483 0.91418564 0.91418564 0.91768827 0.91243433] mean value: 0.9185070820659355 key: test_fscore value: [0.78787879 0.83333333 0.6984127 0.86206897 0.85714286 0.81818182 0.86567164 0.8115942 0.8 0.78688525] mean value: 0.8121169551057972 key: train_fscore value: [0.92307692 0.91282051 0.92491468 0.92307692 0.92598967 0.92783505 0.91623932 0.9165247 0.92047377 0.91496599] mean value: 0.9205917537039755 key: test_precision value: [0.76470588 0.75 0.70967742 0.96153846 0.78947368 0.79411765 0.82857143 0.73684211 0.71794872 0.8 ] mean value: 0.7852875346298895 key: train_precision value: [0.9 0.89 0.90033223 0.9 0.90878378 0.90909091 0.89333333 0.89368771 0.89180328 0.89072848] mean value: 0.897775971527256 key: test_recall value: [0.8125 0.9375 0.6875 0.78125 0.9375 0.84375 0.90625 0.90322581 0.90322581 0.77419355] mean value: 0.8486895161290322 key: train_recall value: [0.94736842 0.93684211 0.95087719 0.94736842 0.94385965 0.94736842 0.94035088 0.94055944 0.95104895 0.94055944] mean value: 0.944620291988713 key: test_roc_auc value: [0.78125 0.8125 0.703125 0.875 0.83971774 0.80897177 0.85635081 0.7953629 0.7797379 0.79334677] mean value: 0.8045362903225807 key: train_roc_auc value: [0.92105263 0.91052632 0.92280702 0.92105263 0.92472703 0.92648141 0.91423138 0.91413937 0.91762974 0.91238498] mean value: 0.91850325113483 key: test_jcc value: [0.65 0.71428571 0.53658537 0.75757576 0.75 0.69230769 0.76315789 0.68292683 0.66666667 0.64864865] mean value: 0.6862154569343273 key: train_jcc value: [0.85714286 0.83962264 0.86031746 0.85714286 0.86217949 0.86538462 0.84542587 0.84591195 0.85266458 0.84326019] mean value: 0.8529052500760415 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02883339 0.01046467 0.01015687 0.00999999 0.01025605 0.01001477 0.01010776 0.01026273 0.0100286 0.01012015] mean value: 0.012024497985839844 key: score_time value: [0.0092907 0.00902653 0.00880337 0.00878048 0.0087707 0.00881338 0.00873876 0.00882483 0.00882506 0.008744 ] mean value: 0.008861780166625977 key: test_mcc value: [0.51639778 0.5336001 0.50097943 0.68884672 0.60087592 0.59372402 0.59372402 0.43812738 0.48255984 0.56449867] mean value: 0.5513333888290408 key: train_mcc value: [0.58176182 0.61191897 0.58742755 0.57887321 0.60786984 0.56185823 0.56981689 0.61476947 0.615254 0.61631894] mean value: 0.5945868902542161 key: test_accuracy value: [0.75 0.765625 0.75 0.84375 0.79365079 0.79365079 0.79365079 0.71428571 0.73015873 0.77777778] mean value: 0.7712549603174603 key: train_accuracy value: [0.78947368 0.80350877 0.79122807 0.7877193 0.80210158 0.7793345 0.78283713 0.8056042 0.8056042 0.8056042 ] mean value: 0.7953015638922174 key: test_fscore value: [0.77777778 0.7761194 0.74193548 0.84848485 0.81690141 0.8115942 0.8115942 0.73529412 0.76056338 0.75 ] mean value: 0.7830264825295223 key: train_fscore value: [0.7993311 0.81518152 0.80395387 0.79866889 0.81198003 0.79 0.79470199 0.8159204 0.81652893 0.81773399] mean value: 0.8064000712331675 key: test_precision value: [0.7 0.74285714 0.76666667 0.82352941 0.74358974 0.75675676 0.75675676 0.67567568 0.675 0.84 ] mean value: 0.7480832154067448 key: train_precision value: [0.76357827 0.7694704 0.75776398 0.75949367 0.7721519 0.75238095 0.7523511 0.77602524 0.77429467 0.77089783] mean value: 0.7648408014336768 key: test_recall value: [0.875 0.8125 0.71875 0.875 0.90625 0.875 0.875 0.80645161 0.87096774 0.67741935] mean value: 0.8292338709677419 key: train_recall value: [0.83859649 0.86666667 0.85614035 0.84210526 0.85614035 0.83157895 0.84210526 0.86013986 0.86363636 0.87062937] mean value: 0.8527738927738928 key: test_roc_auc value: [0.75 0.765625 0.75 0.84375 0.79183468 0.79233871 0.79233871 0.71572581 0.73235887 0.77620968] mean value: 0.7710181451612903 key: train_roc_auc value: [0.78947368 0.80350877 0.79122807 0.7877193 0.80219605 0.77942584 0.78294074 0.80550853 0.80550239 0.80549012] mean value: 0.795299349773034 key: test_jcc value: [0.63636364 0.63414634 0.58974359 0.73684211 0.69047619 0.68292683 0.68292683 0.58139535 0.61363636 0.6 ] mean value: 0.6448457234320147 key: train_jcc value: [0.66573816 0.68802228 0.67217631 0.66481994 0.68347339 0.65289256 0.65934066 0.68907563 0.68994413 0.69166667] mean value: 0.6757149740497587 MCC on Blind test: 0.4 Accuracy on Blind test: 0.7 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01466751 0.02034426 0.02604008 0.02144074 0.02184224 0.02580714 0.02606153 0.02597189 0.02331066 0.02317643] mean value: 0.022866249084472656 key: score_time value: [0.01097655 0.01114917 0.01177526 0.01173902 0.01183462 0.01193428 0.01192403 0.01199174 0.01196337 0.01206636] mean value: 0.01173543930053711 key: test_mcc value: [0.52915026 0.62622429 0.6011334 0.75592895 0.57258185 0.55611985 0.39842149 0.50132936 0.53159579 0.55611985] mean value: 0.5628605087006604 key: train_mcc value: [0.48845623 0.73720978 0.75400915 0.73036878 0.72229646 0.73423379 0.61275359 0.6925491 0.7308577 0.72561169] mean value: 0.6928346269749263 key: test_accuracy value: [0.71875 0.8125 0.796875 0.875 0.77777778 0.77777778 0.68253968 0.71428571 0.76190476 0.77777778] mean value: 0.7695188492063492 key: train_accuracy value: [0.69473684 0.86842105 0.87017544 0.86491228 0.85639229 0.86690018 0.78283713 0.82661996 0.86514886 0.86164623] mean value: 0.8357790272528959 key: test_fscore value: [0.7804878 0.80645161 0.8115942 0.86666667 0.80555556 0.78787879 0.61538462 0.76923077 0.7761194 0.76666667] mean value: 0.7786036085047962 key: train_fscore value: [0.76549865 0.87046632 0.88141026 0.86225403 0.86688312 0.86428571 0.73043478 0.85157421 0.8627451 0.86722689] mean value: 0.8422779070456206 key: test_precision value: [0.64 0.83333333 0.75675676 0.92857143 0.725 0.76470588 0.8 0.63829787 0.72222222 0.79310345] mean value: 0.760199094385297 key: train_precision value: [0.6214442 0.85714286 0.81120944 0.87956204 0.80664653 0.88 0.96 0.74540682 0.88 0.83495146] mean value: 0.8276363347916831 key: test_recall value: [1. 0.78125 0.875 0.8125 0.90625 0.8125 0.5 0.96774194 0.83870968 0.74193548] mean value: 0.8235887096774194 key: train_recall value: [0.99649123 0.88421053 0.96491228 0.84561404 0.93684211 0.84912281 0.58947368 0.99300699 0.84615385 0.9020979 ] mean value: 0.8807925407925408 key: test_roc_auc value: [0.71875 0.8125 0.796875 0.875 0.77570565 0.77721774 0.68548387 0.71824597 0.76310484 0.77721774] mean value: 0.7700100806451613 key: train_roc_auc value: [0.69473684 0.86842105 0.87017544 0.86491228 0.85653294 0.8668691 0.78249908 0.82632806 0.86518219 0.86157527] mean value: 0.8357232241442768 key: test_jcc value: [0.64 0.67567568 0.68292683 0.76470588 0.6744186 0.65 0.44444444 0.625 0.63414634 0.62162162] mean value: 0.6412939399477553 key: train_jcc value: [0.62008734 0.7706422 0.78796562 0.75786164 0.76504298 0.76100629 0.57534247 0.74151436 0.75862069 0.76557864] mean value: 0.7303662209332994 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02438712 0.02357697 0.02358818 0.03072786 0.02425742 0.02381968 0.05035043 0.02409315 0.01967001 0.02207899] mean value: 0.026654982566833497 key: score_time value: [0.01202822 0.01200724 0.01198888 0.01192451 0.01296234 0.01313114 0.01228142 0.01198983 0.01198268 0.01196551] mean value: 0.012226176261901856 key: test_mcc value: [0.43033148 0.63228041 0.5378562 0.71910121 0.52928314 0.61982085 0.78719616 0.45528691 0.4969666 0.68865372] mean value: 0.589677668921627 key: train_mcc value: [0.45769586 0.7525536 0.70906282 0.76491699 0.67239075 0.79105503 0.78723304 0.70209276 0.72529147 0.71320874] mean value: 0.7075501065137321 key: test_accuracy value: [0.65625 0.796875 0.765625 0.859375 0.74603175 0.80952381 0.88888889 0.6984127 0.74603175 0.84126984] mean value: 0.780828373015873 key: train_accuracy value: [0.6754386 0.86666667 0.84736842 0.88245614 0.81611208 0.89316988 0.89141856 0.83362522 0.86164623 0.84238179] mean value: 0.8410283589885397 key: test_fscore value: [0.74418605 0.82666667 0.7826087 0.85714286 0.78947368 0.81818182 0.89855072 0.75324675 0.75757576 0.84848485] mean value: 0.807611785231071 key: train_fscore value: [0.75431607 0.88012618 0.86124402 0.88224956 0.84257871 0.8985025 0.89666667 0.85627837 0.85662432 0.86196319] mean value: 0.8590549580660698 key: test_precision value: [0.59259259 0.72093023 0.72972973 0.87096774 0.68181818 0.79411765 0.83783784 0.63043478 0.71428571 0.8 ] mean value: 0.7372714460425198 key: train_precision value: [0.60683761 0.79942693 0.78947368 0.88380282 0.73560209 0.85443038 0.85396825 0.75466667 0.89056604 0.76775956] mean value: 0.7936534037246936 key: test_recall value: [1. 0.96875 0.84375 0.84375 0.9375 0.84375 0.96875 0.93548387 0.80645161 0.90322581] mean value: 0.9051411290322581 key: train_recall value: [0.99649123 0.97894737 0.94736842 0.88070175 0.98596491 0.94736842 0.94385965 0.98951049 0.82517483 0.98251748] mean value: 0.9477904551588762 key: test_roc_auc value: [0.65625 0.796875 0.765625 0.859375 0.74294355 0.80897177 0.88760081 0.70211694 0.74697581 0.8422379 ] mean value: 0.7808971774193548 key: train_roc_auc value: [0.6754386 0.86666667 0.84736842 0.88245614 0.81640903 0.89326463 0.89151024 0.83335174 0.86171022 0.84213593] mean value: 0.8410311618206355 key: test_jcc value: [0.59259259 0.70454545 0.64285714 0.75 0.65217391 0.69230769 0.81578947 0.60416667 0.6097561 0.73684211] mean value: 0.6801031138521372 key: train_jcc value: [0.60554371 0.78591549 0.75630252 0.78930818 0.72797927 0.81570997 0.81268882 0.74867725 0.74920635 0.7574124 ] mean value: 0.7548743963045716 MCC on Blind test: 0.38 Accuracy on Blind test: 0.7 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.22404265 0.20949054 0.21080303 0.2099843 0.21042252 0.20934153 0.20883155 0.21123242 0.21139431 0.21479511] mean value: 0.2120337963104248 key: score_time value: [0.01565266 0.0155592 0.015872 0.01542377 0.01540327 0.0154717 0.01560116 0.01561856 0.01600695 0.01574636] mean value: 0.0156355619430542 key: test_mcc value: [0.87671401 0.84416229 0.75 0.84416229 0.84530217 0.68352185 0.71443023 0.68865372 0.76058095 0.87487431] mean value: 0.7882401819986872 key: train_mcc value: [0.95791832 0.93338505 0.9754446 0.94751425 0.96862577 0.95451924 0.96497362 0.96152336 0.96852915 0.94398027] mean value: 0.9576413651109649 key: test_accuracy value: [0.9375 0.921875 0.875 0.921875 0.92063492 0.84126984 0.85714286 0.84126984 0.87301587 0.93650794] mean value: 0.892609126984127 key: train_accuracy value: [0.97894737 0.96666667 0.9877193 0.97368421 0.98423818 0.97723292 0.98248687 0.98073555 0.98423818 0.97197898] mean value: 0.9787928226871908 key: test_fscore value: [0.93939394 0.92063492 0.875 0.92063492 0.91803279 0.84848485 0.86153846 0.84848485 0.88235294 0.9375 ] mean value: 0.8952057667233655 key: train_fscore value: [0.97902098 0.96684119 0.98769772 0.97391304 0.98434783 0.97731239 0.98245614 0.98086957 0.98434783 0.97212544] mean value: 0.9788932108732904 key: test_precision value: [0.91176471 0.93548387 0.875 0.93548387 0.96551724 0.82352941 0.84848485 0.8 0.81081081 0.90909091] mean value: 0.8815165669348421 key: train_precision value: [0.97560976 0.96180556 0.98943662 0.96551724 0.97586207 0.97222222 0.98245614 0.97577855 0.97923875 0.96875 ] mean value: 0.9746676905327416 key: test_recall value: [0.96875 0.90625 0.875 0.90625 0.875 0.875 0.875 0.90322581 0.96774194 0.96774194] mean value: 0.9119959677419355 key: train_recall value: [0.98245614 0.97192982 0.98596491 0.98245614 0.99298246 0.98245614 0.98245614 0.98601399 0.98951049 0.97552448] mean value: 0.9831750705434916 key: test_roc_auc value: [0.9375 0.921875 0.875 0.921875 0.92137097 0.84072581 0.85685484 0.8422379 0.87449597 0.93699597] mean value: 0.8928931451612904 key: train_roc_auc value: [0.97894737 0.96666667 0.9877193 0.97368421 0.98425347 0.97724206 0.98248681 0.98072629 0.98422893 0.97197276] mean value: 0.9787927861612072 key: test_jcc value: [0.88571429 0.85294118 0.77777778 0.85294118 0.84848485 0.73684211 0.75675676 0.73684211 0.78947368 0.88235294] mean value: 0.8120126857588158 key: train_jcc value: [0.95890411 0.93581081 0.97569444 0.94915254 0.96917808 0.9556314 0.96551724 0.96245734 0.96917808 0.94576271] mean value: 0.9587286762045821 MCC on Blind test: 0.71 Accuracy on Blind test: 0.86 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.09277225 0.11120796 0.10793185 0.10711575 0.11093473 0.12785196 0.11650276 0.08445287 0.10883117 0.11838365] mean value: 0.10859849452972412 key: score_time value: [0.02065492 0.03160429 0.0220325 0.02657008 0.0324316 0.03746438 0.04351306 0.02652216 0.03911519 0.03521681] mean value: 0.03151249885559082 key: test_mcc value: [0.78163175 0.87671401 0.78163175 0.84416229 0.84530217 0.72270545 0.78094752 0.77822581 0.72407013 0.96875 ] mean value: 0.8104140873134662 key: train_mcc value: [0.96857012 0.98246219 0.98947978 0.99300691 0.98954653 0.98601347 0.98601347 0.97548767 0.9757759 0.98254138] mean value: 0.9828897414834783 key: test_accuracy value: [0.890625 0.9375 0.890625 0.921875 0.92063492 0.85714286 0.88888889 0.88888889 0.85714286 0.98412698] mean value: 0.9037450396825396 key: train_accuracy value: [0.98421053 0.99122807 0.99473684 0.99649123 0.99474606 0.99299475 0.99299475 0.98774081 0.98774081 0.99124343] mean value: 0.9914127262113251 key: test_fscore value: [0.88888889 0.93548387 0.89230769 0.92307692 0.91803279 0.86956522 0.89552239 0.88888889 0.86567164 0.98412698] mean value: 0.9061565282384415 key: train_fscore value: [0.9840708 0.99124343 0.99472759 0.99647887 0.99470899 0.99295775 0.99295775 0.98774081 0.98761062 0.99121265] mean value: 0.991370926105971 key: test_precision value: [0.90322581 0.96666667 0.87878788 0.90909091 0.96551724 0.81081081 0.85714286 0.875 0.80555556 0.96875 ] mean value: 0.8940547725885601 key: train_precision value: [0.99285714 0.98951049 0.99647887 1. 1. 0.99646643 0.99646643 0.98947368 1. 0.99646643] mean value: 0.9957719483103814 key: test_recall value: [0.875 0.90625 0.90625 0.9375 0.875 0.9375 0.9375 0.90322581 0.93548387 1. ] mean value: 0.9213709677419355 key: train_recall value: [0.9754386 0.99298246 0.99298246 0.99298246 0.98947368 0.98947368 0.98947368 0.98601399 0.97552448 0.98601399] mean value: 0.9870359465096307 key: test_roc_auc value: [0.890625 0.9375 0.890625 0.921875 0.92137097 0.85584677 0.88810484 0.8891129 0.85836694 0.984375 ] mean value: 0.9037802419354839 key: train_roc_auc value: [0.98421053 0.99122807 0.99473684 0.99649123 0.99473684 0.99298859 0.99298859 0.98774384 0.98776224 0.99125261] mean value: 0.9914139369402527 key: test_jcc value: [0.8 0.87878788 0.80555556 0.85714286 0.84848485 0.76923077 0.81081081 0.8 0.76315789 0.96875 ] mean value: 0.8301920614749563 key: train_jcc value: [0.96864111 0.98263889 0.98951049 0.99298246 0.98947368 0.98601399 0.98601399 0.97577855 0.97552448 0.9825784 ] mean value: 0.9829156025210628 MCC on Blind test: 0.61 Accuracy on Blind test: 0.81 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.21769905 0.2195487 0.23250198 0.23222423 0.26852727 0.24305701 0.23453808 0.23407316 0.23557377 0.23404908] mean value: 0.23517923355102538 key: score_time value: [0.02685475 0.02689195 0.02682996 0.0266974 0.02714157 0.02700257 0.02708054 0.02719736 0.02715063 0.02718496] mean value: 0.027003169059753418 key: test_mcc value: [0.76354172 0.71910121 0.5336001 0.56360186 0.46146899 0.42871785 0.49193548 0.3798283 0.43923912 0.58770161] mean value: 0.5368736246899668 key: train_mcc value: [0.95486219 0.95848494 0.93741933 0.94105208 0.94115791 0.94816792 0.95154989 0.94115006 0.93777673 0.93051399] mean value: 0.9442135037881351 key: test_accuracy value: [0.875 0.859375 0.765625 0.78125 0.73015873 0.71428571 0.74603175 0.68253968 0.6984127 0.79365079] mean value: 0.7646329365079365 key: train_accuracy value: [0.97719298 0.97894737 0.96842105 0.97017544 0.97022767 0.9737303 0.97548161 0.97022767 0.96847636 0.96497373] mean value: 0.9717854180108766 key: test_fscore value: [0.88571429 0.86153846 0.7761194 0.77419355 0.74626866 0.72727273 0.75 0.71428571 0.74666667 0.79365079] mean value: 0.7775710257217239 key: train_fscore value: [0.9775475 0.97931034 0.96896552 0.9707401 0.9707401 0.97418244 0.97586207 0.97084048 0.96917808 0.96563574] mean value: 0.9723002378616942 key: test_precision value: [0.81578947 0.84848485 0.74285714 0.8 0.71428571 0.70588235 0.75 0.64102564 0.63636364 0.78125 ] mean value: 0.7435938809642371 key: train_precision value: [0.96258503 0.96271186 0.95254237 0.9527027 0.9527027 0.95608108 0.95932203 0.95286195 0.94966443 0.94932432] mean value: 0.9550498498403011 key: test_recall value: [0.96875 0.875 0.8125 0.75 0.78125 0.75 0.75 0.80645161 0.90322581 0.80645161] mean value: 0.8203629032258064 key: train_recall value: [0.99298246 0.99649123 0.98596491 0.98947368 0.98947368 0.99298246 0.99298246 0.98951049 0.98951049 0.98251748] mean value: 0.9901889338731443 key: test_roc_auc value: [0.875 0.859375 0.765625 0.78125 0.72933468 0.71370968 0.74596774 0.68447581 0.7016129 0.79385081] mean value: 0.7650201612903226 key: train_roc_auc value: [0.97719298 0.97894737 0.96842105 0.97017544 0.97026132 0.97376396 0.97551221 0.97019384 0.96843946 0.96494295] mean value: 0.9717850570482149 key: test_jcc value: [0.79487179 0.75675676 0.63414634 0.63157895 0.5952381 0.57142857 0.6 0.55555556 0.59574468 0.65789474] mean value: 0.6393215480375779 key: train_jcc value: [0.95608108 0.95945946 0.93979933 0.94314381 0.94314381 0.94966443 0.95286195 0.94333333 0.94019934 0.93355482] mean value: 0.9461241365611688 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.89135027 0.88410759 0.87596011 0.88601875 0.88395333 0.88206983 0.87664676 0.90293217 0.88312197 0.88222742] mean value: 0.8848388195037842 key: score_time value: [0.00947404 0.00941205 0.00930047 0.00934148 0.00985909 0.00946188 0.0094347 0.00927377 0.00946712 0.00947332] mean value: 0.00944979190826416 key: test_mcc value: [0.84416229 0.875 0.78163175 0.78163175 0.87487431 0.78094752 0.74596774 0.77822581 0.72407013 0.93649194] mean value: 0.8123003234581166 key: train_mcc value: [0.99298246 1. 1. 0.99649736 0.99301901 0.99650345 0.99299472 0.98949809 0.99299472 0.98954691] mean value: 0.9944036729473622 key: test_accuracy value: [0.921875 0.9375 0.890625 0.890625 0.93650794 0.88888889 0.87301587 0.88888889 0.85714286 0.96825397] mean value: 0.9053323412698413 key: train_accuracy value: [0.99649123 1. 1. 0.99824561 0.99649737 0.99824869 0.99649737 0.99474606 0.99649737 0.99474606] mean value: 0.9971969766798783 key: test_fscore value: [0.92307692 0.9375 0.89230769 0.88888889 0.93548387 0.89552239 0.875 0.88888889 0.86567164 0.96774194] mean value: 0.9070082229464752 key: train_fscore value: [0.99649123 1. 1. 0.99824253 0.99647887 0.99824253 0.99649123 0.9947644 0.9965035 0.99472759] mean value: 0.9971941877567602 key: test_precision value: [0.90909091 0.9375 0.87878788 0.90322581 0.96666667 0.85714286 0.875 0.875 0.80555556 0.96774194] mean value: 0.8975711609179351 key: train_precision value: [0.99649123 1. 1. 1. 1. 1. 0.99649123 0.99303136 0.9965035 1. ] mean value: 0.9982517311528865 key: test_recall value: [0.9375 0.9375 0.90625 0.875 0.90625 0.9375 0.875 0.90322581 0.93548387 0.96774194] mean value: 0.9181451612903225 key: train_recall value: [0.99649123 1. 1. 0.99649123 0.99298246 0.99649123 0.99649123 0.9965035 0.9965035 0.98951049] mean value: 0.9961464850938535 key: test_roc_auc value: [0.921875 0.9375 0.890625 0.890625 0.93699597 0.88810484 0.87298387 0.8891129 0.85836694 0.96824597] mean value: 0.9054435483870967 key: train_roc_auc value: [0.99649123 1. 1. 0.99824561 0.99649123 0.99824561 0.99649736 0.99474298 0.99649736 0.99475524] mean value: 0.9971966629861366 key: test_jcc value: [0.85714286 0.88235294 0.80555556 0.8 0.87878788 0.81081081 0.77777778 0.8 0.76315789 0.9375 ] mean value: 0.8313085715988193 key: train_jcc value: [0.99300699 1. 1. 0.99649123 0.99298246 0.99649123 0.99300699 0.98958333 0.99303136 0.98951049] mean value: 0.9944104080023528 MCC on Blind test: 0.71 Accuracy on Blind test: 0.86 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.0329814 0.03775144 0.04330111 0.03259683 0.03292823 0.03234887 0.03247094 0.03311419 0.0330894 0.03246641] mean value: 0.03430488109588623 key: score_time value: [0.01277876 0.01312947 0.01511431 0.01498032 0.01761007 0.01501441 0.01503301 0.01509452 0.01503158 0.01518631] mean value: 0.0148972749710083 key: test_mcc value: [0.17213259 0.32163376 0.28347335 0.22473329 0.11331178 0.10141277 0.4672925 0.1715272 0.31933319 0.29699435] mean value: 0.24718447769250632 key: train_mcc value: [0.34935261 0.33007486 0.3365728 0.33333333 0.3515425 0.35467079 0.31597841 0.35903931 0.33033226 0.38353707] mean value: 0.3444433939887435 key: test_accuracy value: [0.5625 0.59375 0.59375 0.578125 0.53968254 0.53968254 0.68253968 0.53968254 0.58730159 0.61904762] mean value: 0.5836061507936507 key: train_accuracy value: [0.60877193 0.59824561 0.60175439 0.6 0.60945709 0.61120841 0.59019264 0.61471103 0.59894921 0.62872154] mean value: 0.6062011859772022 key: test_fscore value: [0.6744186 0.71111111 0.70454545 0.68965517 0.6741573 0.66666667 0.76190476 0.6741573 0.70454545 0.7 ] mean value: 0.6961161832579978 key: train_fscore value: [0.71878941 0.71339174 0.71518193 0.71428571 0.71878941 0.71969697 0.70895522 0.72222222 0.71410737 0.72959184] mean value: 0.7175011819161466 key: test_precision value: [0.53703704 0.55172414 0.55357143 0.54545455 0.52631579 0.52727273 0.61538462 0.51724138 0.54385965 0.57142857] mean value: 0.5489289880986796 key: train_precision value: [0.56102362 0.55447471 0.55664062 0.55555556 0.56102362 0.56213018 0.54913295 0.56521739 0.55533981 0.57429719] mean value: 0.5594835644197532 key: test_recall value: [0.90625 1. 0.96875 0.9375 0.9375 0.90625 1. 0.96774194 1. 0.90322581] mean value: 0.9527217741935484 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.5625 0.59375 0.59375 0.578125 0.53326613 0.53377016 0.67741935 0.54637097 0.59375 0.6234879 ] mean value: 0.5836189516129032 key: train_roc_auc value: [0.60877193 0.59824561 0.60175439 0.6 0.61013986 0.61188811 0.59090909 0.61403509 0.59824561 0.62807018] mean value: 0.6062059869954607 key: test_jcc value: [0.50877193 0.55172414 0.54385965 0.52631579 0.50847458 0.5 0.61538462 0.50847458 0.54385965 0.53846154] mean value: 0.5345326461863421 key: train_jcc value: [0.56102362 0.55447471 0.55664062 0.55555556 0.56102362 0.56213018 0.54913295 0.56521739 0.55533981 0.57429719] mean value: 0.5594835644197532 MCC on Blind test: 0.04 Accuracy on Blind test: 0.46 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.03066587 0.03988957 0.05028677 0.04420638 0.04806757 0.04310489 0.03911352 0.03920937 0.03922009 0.03915834] mean value: 0.041292238235473636 key: score_time value: [0.01892376 0.01950431 0.01943779 0.01930618 0.01957011 0.01895785 0.01866388 0.01875091 0.01871824 0.01869988] mean value: 0.0190532922744751 key: test_mcc value: [0.65657067 0.6644106 0.59404013 0.71910121 0.62939541 0.61982085 0.72270545 0.53874599 0.64134943 0.62939541] mean value: 0.6415535145585206 key: train_mcc value: [0.78479784 0.7840214 0.77698982 0.7922879 0.78479063 0.79883396 0.78109075 0.77367788 0.77401834 0.80014056] mean value: 0.7850649085161974 key: test_accuracy value: [0.828125 0.828125 0.796875 0.859375 0.80952381 0.80952381 0.85714286 0.76190476 0.80952381 0.80952381] mean value: 0.8169642857142857 key: train_accuracy value: [0.89122807 0.89122807 0.8877193 0.89473684 0.89141856 0.89842382 0.88966725 0.88616462 0.88616462 0.89842382] mean value: 0.8915174977724521 key: test_fscore value: [0.83076923 0.84057971 0.8 0.85714286 0.82857143 0.81818182 0.86956522 0.7826087 0.82857143 0.78571429] mean value: 0.8241704672139455 key: train_fscore value: [0.89527027 0.89455782 0.89115646 0.8989899 0.89491525 0.90169492 0.89303905 0.88964346 0.89001692 0.90301003] mean value: 0.8952294091118016 key: test_precision value: [0.81818182 0.78378378 0.78787879 0.87096774 0.76315789 0.79411765 0.81081081 0.71052632 0.74358974 0.88 ] mean value: 0.7963014543765567 key: train_precision value: [0.86319218 0.8679868 0.86468647 0.86407767 0.86557377 0.87213115 0.86513158 0.86468647 0.86229508 0.86538462] mean value: 0.8655145782618917 key: test_recall value: [0.84375 0.90625 0.8125 0.84375 0.90625 0.84375 0.9375 0.87096774 0.93548387 0.70967742] mean value: 0.8609879032258064 key: train_recall value: [0.92982456 0.92280702 0.91929825 0.93684211 0.92631579 0.93333333 0.92280702 0.91608392 0.91958042 0.94405594] mean value: 0.9270948349895718 key: test_roc_auc value: [0.828125 0.828125 0.796875 0.859375 0.80796371 0.80897177 0.85584677 0.76360887 0.81149194 0.80796371] mean value: 0.8168346774193548 key: train_roc_auc value: [0.89122807 0.89122807 0.8877193 0.89473684 0.89147957 0.89848485 0.88972519 0.88611213 0.886106 0.89834376] mean value: 0.8915163783584836 key: test_jcc value: [0.71052632 0.725 0.66666667 0.75 0.70731707 0.69230769 0.76923077 0.64285714 0.70731707 0.64705882] mean value: 0.701828155672262 key: train_jcc value: [0.81039755 0.80923077 0.80368098 0.81651376 0.80981595 0.82098765 0.80674847 0.80122324 0.80182927 0.82317073] mean value: 0.8103598378899687 MCC on Blind test: 0.47 Accuracy on Blind test: 0.74 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.18102288 0.29600692 0.18413353 0.21506524 0.22669625 0.33359575 0.33408237 0.31031942 0.31877875 0.30942845] mean value: 0.2709129571914673 key: score_time value: [0.01897454 0.01883221 0.01218867 0.01218319 0.01881433 0.01916432 0.01888371 0.01883912 0.01884961 0.01881719] mean value: 0.017554688453674316 key: test_mcc value: [0.65657067 0.6644106 0.56360186 0.790965 0.68740835 0.61982085 0.72270545 0.53874599 0.64134943 0.62939541] mean value: 0.6514973606605419 key: train_mcc value: [0.78479784 0.7840214 0.80845708 0.80881692 0.83990276 0.79883396 0.78109075 0.77367788 0.77401834 0.80014056] mean value: 0.7953757483531136 key: test_accuracy value: [0.828125 0.828125 0.78125 0.890625 0.84126984 0.80952381 0.85714286 0.76190476 0.80952381 0.80952381] mean value: 0.8217013888888889 key: train_accuracy value: [0.89122807 0.89122807 0.90350877 0.90350877 0.91943958 0.89842382 0.88966725 0.88616462 0.88616462 0.89842382] mean value: 0.8967757396995115 key: test_fscore value: [0.83076923 0.84057971 0.77419355 0.88135593 0.85294118 0.81818182 0.86956522 0.7826087 0.82857143 0.78571429] mean value: 0.8264481043486244 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_7030.py:136: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_7030.py:139: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.89527027 0.89455782 0.90630324 0.90662139 0.92123288 0.90169492 0.89303905 0.88964346 0.89001692 0.90301003] mean value: 0.9001389981005551 key: test_precision value: [0.81818182 0.78378378 0.8 0.96296296 0.80555556 0.79411765 0.81081081 0.71052632 0.74358974 0.88 ] mean value: 0.8109528637732972 key: train_precision value: [0.86319218 0.8679868 0.8807947 0.87828947 0.89966555 0.87213115 0.86513158 0.86468647 0.86229508 0.86538462] mean value: 0.8719557601087767 key: test_recall value: [0.84375 0.90625 0.75 0.8125 0.90625 0.84375 0.9375 0.87096774 0.93548387 0.70967742] mean value: 0.8516129032258064 key: train_recall value: [0.92982456 0.92280702 0.93333333 0.93684211 0.94385965 0.93333333 0.92280702 0.91608392 0.91958042 0.94405594] mean value: 0.9302527297264139 key: test_roc_auc value: [0.828125 0.828125 0.78125 0.890625 0.84022177 0.80897177 0.85584677 0.76360887 0.81149194 0.80796371] mean value: 0.8216229838709678 key: train_roc_auc value: [0.89122807 0.89122807 0.90350877 0.90350877 0.91948227 0.89848485 0.88972519 0.88611213 0.886106 0.89834376] mean value: 0.8967727886148938 key: test_jcc value: [0.71052632 0.725 0.63157895 0.78787879 0.74358974 0.69230769 0.76923077 0.64285714 0.70731707 0.64705882] mean value: 0.7057345295722174 key: train_jcc value: [0.81039755 0.80923077 0.82866044 0.82919255 0.85396825 0.82098765 0.80674847 0.80122324 0.80182927 0.82317073] mean value: 0.8185408921605636 MCC on Blind test: 0.47 Accuracy on Blind test: 0.74 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03537369 0.03771615 0.03843212 0.03799272 0.0374248 0.0644691 0.06080461 0.05791664 0.03776813 0.04242468] mean value: 0.04503226280212402 key: score_time value: [0.01476192 0.01482749 0.01522851 0.01489902 0.0200181 0.02211308 0.01651382 0.02252531 0.01618242 0.01794791] mean value: 0.01750175952911377 key: test_mcc value: [0.56360186 0.62622429 0.46897905 0.75 0.50663549 0.65085805 0.65315611 0.5026181 0.56086231 0.65821474] mean value: 0.5941149998637926 key: train_mcc value: [0.72381022 0.71653529 0.69261083 0.71314164 0.72366652 0.7206799 0.72384769 0.73407475 0.73442906 0.70349326] mean value: 0.7186289159760348 key: test_accuracy value: [0.78125 0.8125 0.734375 0.875 0.74603175 0.82539683 0.82539683 0.74603175 0.77777778 0.82539683] mean value: 0.7949156746031746 key: train_accuracy value: [0.86140351 0.85789474 0.84561404 0.85614035 0.86164623 0.85989492 0.86164623 0.86690018 0.86690018 0.85113835] mean value: 0.8589178726149875 key: test_fscore value: [0.77419355 0.81818182 0.73015873 0.875 0.77777778 0.83076923 0.8358209 0.76470588 0.78787879 0.80701754] mean value: 0.800150421488842 key: train_fscore value: [0.86495726 0.86106346 0.85034014 0.85958904 0.86355786 0.8630137 0.86402754 0.86896552 0.86986301 0.85568761] mean value: 0.8621065139729672 key: test_precision value: [0.8 0.79411765 0.74193548 0.875 0.7 0.81818182 0.8 0.7027027 0.74285714 0.88461538] mean value: 0.785941017928684 key: train_precision value: [0.84333333 0.84228188 0.82508251 0.83946488 0.85034014 0.84280936 0.84797297 0.85714286 0.85234899 0.83168317] mean value: 0.8432460096046102 key: test_recall value: [0.75 0.84375 0.71875 0.875 0.875 0.84375 0.875 0.83870968 0.83870968 0.74193548] mean value: 0.8200604838709677 key: train_recall value: [0.8877193 0.88070175 0.87719298 0.88070175 0.87719298 0.88421053 0.88070175 0.88111888 0.88811189 0.88111888] mean value: 0.8818770702981229 key: test_roc_auc value: [0.78125 0.8125 0.734375 0.875 0.74395161 0.82510081 0.82459677 0.74747984 0.77872984 0.82409274] mean value: 0.7947076612903226 key: train_roc_auc value: [0.86140351 0.85789474 0.84561404 0.85614035 0.86167341 0.85993743 0.86167955 0.86687523 0.86686296 0.85108576] mean value: 0.85891669733775 key: test_jcc value: [0.63157895 0.69230769 0.575 0.77777778 0.63636364 0.71052632 0.71794872 0.61904762 0.65 0.67647059] mean value: 0.6687021294838632 key: train_jcc value: [0.76204819 0.7560241 0.73964497 0.75375375 0.75987842 0.75903614 0.76060606 0.76829268 0.76969697 0.74777448] mean value: 0.7576755771297808 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.83497286 0.88039732 0.9799037 0.91570306 1.0464592 0.90148354 1.06532311 0.94581509 1.02171636 0.94928885] mean value: 0.9541063070297241 key: score_time value: [0.01470733 0.01652384 0.01522231 0.01531196 0.0153327 0.01528358 0.01523829 0.01887465 0.02170634 0.0123136 ] mean value: 0.016051459312438964 key: test_mcc value: [0.62622429 0.790965 0.6011334 0.75592895 0.67763983 0.65315611 0.72270545 0.71790017 0.59049817 0.74722285] mean value: 0.6883374204864343 key: train_mcc value: [0.87377027 0.88425952 0.84975598 0.87745769 0.86709616 0.82520071 0.87767677 0.86742393 0.89511471 0.88484639] mean value: 0.8702602118509842 key: test_accuracy value: [0.8125 0.890625 0.796875 0.875 0.82539683 0.82539683 0.85714286 0.85714286 0.79365079 0.87301587] mean value: 0.8406746031746032 key: train_accuracy value: [0.93684211 0.94210526 0.9245614 0.93859649 0.93345009 0.91243433 0.93870403 0.93345009 0.9474606 0.94220665] mean value: 0.9349811042492395 key: test_fscore value: [0.80645161 0.89855072 0.77966102 0.86666667 0.84931507 0.8358209 0.86956522 0.86153846 0.8 0.86666667] mean value: 0.8434236330768697 key: train_fscore value: [0.93728223 0.94240838 0.92598967 0.93934142 0.93402778 0.91349481 0.93934142 0.9347079 0.94809689 0.94320138] mean value: 0.9357891876189721 key: test_precision value: [0.83333333 0.83783784 0.85185185 0.92857143 0.75609756 0.8 0.81081081 0.82352941 0.76470588 0.89655172] mean value: 0.830328984163645 key: train_precision value: [0.93079585 0.9375 0.90878378 0.92808219 0.92439863 0.90102389 0.92808219 0.91891892 0.93835616 0.92881356] mean value: 0.9244755173935344 key: test_recall value: [0.78125 0.96875 0.71875 0.8125 0.96875 0.875 0.9375 0.90322581 0.83870968 0.83870968] mean value: 0.8643145161290322 key: train_recall value: [0.94385965 0.94736842 0.94385965 0.95087719 0.94385965 0.92631579 0.95087719 0.95104895 0.95804196 0.95804196] mean value: 0.9474150410992517 key: test_roc_auc value: [0.8125 0.890625 0.796875 0.875 0.82308468 0.82459677 0.85584677 0.8578629 0.79435484 0.87247984] mean value: 0.8403225806451613 key: train_roc_auc value: [0.93684211 0.94210526 0.9245614 0.93859649 0.93346829 0.91245859 0.93872531 0.93341921 0.94744203 0.94217887] mean value: 0.9349797570850202 key: test_jcc value: [0.67567568 0.81578947 0.63888889 0.76470588 0.73809524 0.71794872 0.76923077 0.75675676 0.66666667 0.76470588] mean value: 0.7308463951652806 key: train_jcc value: [0.88196721 0.89108911 0.86217949 0.88562092 0.8762215 0.84076433 0.88562092 0.87741935 0.90131579 0.89250814] mean value: 0.8794706756486887 MCC on Blind test: 0.43 Accuracy on Blind test: 0.72 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01466441 0.01140618 0.0103066 0.01056147 0.01054859 0.01025057 0.01027822 0.01057982 0.01022387 0.01138043] mean value: 0.011020016670227051 key: score_time value: [0.01233292 0.00939727 0.00900912 0.00904417 0.0090394 0.00901127 0.00909233 0.00893521 0.00901937 0.00898433] mean value: 0.009386539459228516 key: test_mcc value: [0.40804713 0.59404013 0.50395263 0.65657067 0.4647426 0.36661779 0.46743768 0.36629686 0.33569416 0.37474278] mean value: 0.4538142428991961 key: train_mcc value: [0.50180381 0.47378922 0.47754881 0.46667816 0.49609564 0.48166961 0.43632328 0.49918401 0.516636 0.51038486] mean value: 0.4860113400738339 key: test_accuracy value: [0.703125 0.796875 0.75 0.828125 0.73015873 0.68253968 0.73015873 0.68253968 0.66666667 0.68253968] mean value: 0.7252728174603175 key: train_accuracy value: [0.75087719 0.73684211 0.73859649 0.73333333 0.74781086 0.7408056 0.71103327 0.74956217 0.75831874 0.75481611] mean value: 0.742199588287707 key: test_fscore value: [0.71641791 0.8 0.73333333 0.82539683 0.75362319 0.70588235 0.71186441 0.6875 0.67692308 0.62962963] mean value: 0.7240570723857261 key: train_fscore value: [0.75261324 0.73958333 0.74354561 0.73426573 0.75257732 0.74216028 0.66800805 0.75216638 0.75874126 0.76190476] mean value: 0.7405565964118 key: test_precision value: [0.68571429 0.78787879 0.78571429 0.83870968 0.7027027 0.66666667 0.77777778 0.66666667 0.64705882 0.73913043] mean value: 0.7298020108852549 key: train_precision value: [0.74740484 0.73195876 0.72972973 0.73170732 0.73737374 0.73702422 0.78301887 0.74570447 0.75874126 0.74172185] mean value: 0.7444385061131555 key: test_recall value: [0.75 0.8125 0.6875 0.8125 0.8125 0.75 0.65625 0.70967742 0.70967742 0.5483871 ] mean value: 0.724899193548387 key: train_recall value: [0.75789474 0.74736842 0.75789474 0.73684211 0.76842105 0.74736842 0.58245614 0.75874126 0.75874126 0.78321678] mean value: 0.7398944914734389 key: test_roc_auc value: [0.703125 0.796875 0.75 0.828125 0.72883065 0.68145161 0.73135081 0.68296371 0.66733871 0.68044355] mean value: 0.7250504032258065 key: train_roc_auc value: [0.75087719 0.73684211 0.73859649 0.73333333 0.74784689 0.74081708 0.71080849 0.74954607 0.758318 0.75476629] mean value: 0.7421751932278249 key: test_jcc value: [0.55813953 0.66666667 0.57894737 0.7027027 0.60465116 0.54545455 0.55263158 0.52380952 0.51162791 0.45945946] mean value: 0.5704090450112482 key: train_jcc value: [0.60335196 0.58677686 0.59178082 0.5801105 0.60330579 0.5900277 0.50151057 0.60277778 0.61126761 0.61538462] mean value: 0.5886294192736087 MCC on Blind test: 0.42 Accuracy on Blind test: 0.71 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01070476 0.01045203 0.01045537 0.01056457 0.01051593 0.01061487 0.01065612 0.01072598 0.01050091 0.01057959] mean value: 0.010577011108398437 key: score_time value: [0.00903749 0.00892234 0.00901079 0.00903344 0.00917768 0.00899863 0.00909567 0.00893569 0.0089848 0.00909686] mean value: 0.009029340744018555 key: test_mcc value: [0.53150959 0.50395263 0.21971769 0.44539933 0.49493401 0.52419355 0.46146899 0.37363667 0.33366935 0.58770161] mean value: 0.4476183430779835 key: train_mcc value: [0.55284089 0.54886043 0.51932702 0.52686418 0.54489338 0.48908468 0.55937838 0.5535563 0.52463264 0.52543149] mean value: 0.5344869395135066 key: test_accuracy value: [0.765625 0.75 0.609375 0.71875 0.74603175 0.76190476 0.73015873 0.68253968 0.66666667 0.79365079] mean value: 0.7224702380952381 key: train_accuracy value: [0.7754386 0.77368421 0.75964912 0.76315789 0.77232925 0.74430823 0.7793345 0.77583187 0.76182137 0.76182137] mean value: 0.7667376409500107 key: test_fscore value: [0.76190476 0.76470588 0.59016393 0.74285714 0.76470588 0.76190476 0.74626866 0.70588235 0.66666667 0.79365079] mean value: 0.7298710835773833 key: train_fscore value: [0.78451178 0.78172589 0.7609075 0.76843911 0.77508651 0.74914089 0.78424658 0.7852349 0.76949153 0.77181208] mean value: 0.7730596764554477 key: test_precision value: [0.77419355 0.72222222 0.62068966 0.68421053 0.72222222 0.77419355 0.71428571 0.64864865 0.65625 0.78125 ] mean value: 0.7098166085641204 key: train_precision value: [0.75404531 0.75490196 0.75694444 0.75167785 0.76450512 0.73400673 0.76588629 0.75483871 0.74671053 0.74193548] mean value: 0.7525452425971371 key: test_recall value: [0.75 0.8125 0.5625 0.8125 0.8125 0.75 0.78125 0.77419355 0.67741935 0.80645161] mean value: 0.7539314516129032 key: train_recall value: [0.81754386 0.81052632 0.76491228 0.78596491 0.78596491 0.76491228 0.80350877 0.81818182 0.79370629 0.8041958 ] mean value: 0.794941724941725 key: test_roc_auc value: [0.765625 0.75 0.609375 0.71875 0.74495968 0.76209677 0.72933468 0.68397177 0.66683468 0.79385081] mean value: 0.7224798387096774 key: train_roc_auc value: [0.7754386 0.77368421 0.75964912 0.76315789 0.77235309 0.74434425 0.77937676 0.77575758 0.76176543 0.76174702] mean value: 0.766727395411606 key: test_jcc value: [0.61538462 0.61904762 0.41860465 0.59090909 0.61904762 0.61538462 0.5952381 0.54545455 0.5 0.65789474] mean value: 0.5776965588471097 key: train_jcc value: [0.64542936 0.64166667 0.61408451 0.62395543 0.63276836 0.5989011 0.64507042 0.64640884 0.62534435 0.6284153 ] mean value: 0.6302044344305446 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00986624 0.01092362 0.01090121 0.00989747 0.00991488 0.01093745 0.01010799 0.01105952 0.01068306 0.00987339] mean value: 0.010416483879089356 key: score_time value: [0.01277661 0.01334596 0.01341963 0.01303458 0.01297259 0.0130651 0.01314664 0.01294398 0.0128777 0.01310587] mean value: 0.013068866729736329 key: test_mcc value: [0.44539933 0.375 0.15694121 0.40804713 0.20585278 0.39757328 0.39656932 0.18817455 0.32137677 0.33569416] mean value: 0.32306285455062617 key: train_mcc value: [0.60515381 0.5803782 0.59629523 0.62879079 0.60786984 0.61994057 0.62361994 0.58865786 0.59821633 0.59511948] mean value: 0.6044042041011564 key: test_accuracy value: [0.71875 0.6875 0.578125 0.703125 0.6031746 0.6984127 0.6984127 0.58730159 0.65079365 0.66666667] mean value: 0.6592261904761905 key: train_accuracy value: [0.80175439 0.78947368 0.79649123 0.8122807 0.80210158 0.80910683 0.81085814 0.78984238 0.79859895 0.79509632] mean value: 0.8005604203152364 key: test_fscore value: [0.74285714 0.6875 0.55737705 0.71641791 0.61538462 0.71641791 0.70769231 0.63888889 0.69444444 0.67692308] mean value: 0.6753903346266327 key: train_fscore value: [0.80879865 0.79661017 0.80666667 0.8225539 0.81198003 0.81556684 0.81756757 0.80707395 0.80475382 0.80788177] mean value: 0.8099453364834789 key: test_precision value: [0.68421053 0.6875 0.5862069 0.68571429 0.60606061 0.68571429 0.6969697 0.56097561 0.6097561 0.64705882] mean value: 0.6450166828172873 key: train_precision value: [0.78104575 0.7704918 0.76825397 0.77987421 0.7721519 0.7875817 0.78827362 0.74702381 0.78217822 0.76160991] mean value: 0.7738484885185218 key: test_recall value: [0.8125 0.6875 0.53125 0.75 0.625 0.75 0.71875 0.74193548 0.80645161 0.70967742] mean value: 0.7133064516129032 key: train_recall value: [0.83859649 0.8245614 0.84912281 0.87017544 0.85614035 0.84561404 0.84912281 0.87762238 0.82867133 0.86013986] mean value: 0.84997668997669 key: test_roc_auc value: [0.71875 0.6875 0.578125 0.703125 0.60282258 0.69758065 0.69808468 0.58971774 0.65322581 0.66733871] mean value: 0.6596270161290323 key: train_roc_auc value: [0.80175439 0.78947368 0.79649123 0.8122807 0.80219605 0.80917065 0.81092504 0.78968838 0.79854619 0.79498221] mean value: 0.8005508526561158 key: test_jcc value: [0.59090909 0.52380952 0.38636364 0.55813953 0.44444444 0.55813953 0.54761905 0.46938776 0.53191489 0.51162791] mean value: 0.5122355368608992 key: train_jcc value: [0.67897727 0.66197183 0.67597765 0.69859155 0.68347339 0.68857143 0.69142857 0.67654987 0.67329545 0.67768595] mean value: 0.6806522966183778 MCC on Blind test: 0.25 Accuracy on Blind test: 0.63 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02857804 0.02860475 0.02841306 0.02905703 0.0280447 0.02880526 0.02847433 0.0280602 0.02867246 0.02894044] mean value: 0.028565025329589842 key: score_time value: [0.01360965 0.01362467 0.01358318 0.01370263 0.01335478 0.01353049 0.0134654 0.01333475 0.0136447 0.01354122] mean value: 0.01353914737701416 key: test_mcc value: [0.62622429 0.53150959 0.50395263 0.62622429 0.47011536 0.62325024 0.55611985 0.43812738 0.5026181 0.68740835] mean value: 0.5565550079906242 key: train_mcc value: [0.70869167 0.67488591 0.67886662 0.69787564 0.68539141 0.67649884 0.70019913 0.71082339 0.69984368 0.69230073] mean value: 0.6925377015905695 key: test_accuracy value: [0.8125 0.765625 0.75 0.8125 0.73015873 0.80952381 0.77777778 0.71428571 0.74603175 0.84126984] mean value: 0.775967261904762 key: train_accuracy value: [0.85263158 0.83684211 0.83859649 0.84736842 0.84238179 0.83712785 0.84938704 0.85464098 0.84938704 0.8441331 ] mean value: 0.8452496389836237 key: test_fscore value: [0.81818182 0.76923077 0.73333333 0.81818182 0.76056338 0.82352941 0.78787879 0.73529412 0.76470588 0.82758621] mean value: 0.7838485525749475 key: train_fscore value: [0.85953177 0.84156729 0.8440678 0.85427136 0.84536082 0.84317032 0.8537415 0.85956007 0.8537415 0.85240464] mean value: 0.8507417066756678 key: test_precision value: [0.79411765 0.75757576 0.78571429 0.79411765 0.69230769 0.77777778 0.76470588 0.67567568 0.7027027 0.88888889] mean value: 0.7633583957113369 key: train_precision value: [0.82108626 0.81788079 0.81639344 0.81730769 0.82828283 0.81168831 0.82838284 0.83278689 0.83112583 0.81072555] mean value: 0.8215660434979373 key: test_recall value: [0.84375 0.78125 0.6875 0.84375 0.84375 0.875 0.8125 0.80645161 0.83870968 0.77419355] mean value: 0.8106854838709677 key: train_recall value: [0.90175439 0.86666667 0.87368421 0.89473684 0.86315789 0.87719298 0.88070175 0.88811189 0.87762238 0.8986014 ] mean value: 0.882223040117777 key: test_roc_auc value: [0.8125 0.765625 0.75 0.8125 0.72832661 0.80846774 0.77721774 0.71572581 0.74747984 0.84022177] mean value: 0.7758064516129032 key: train_roc_auc value: [0.85263158 0.83684211 0.83859649 0.84736842 0.84241811 0.83719789 0.84944179 0.85458226 0.8493375 0.84403754] mean value: 0.8452453686664213 key: test_jcc value: [0.69230769 0.625 0.57894737 0.69230769 0.61363636 0.7 0.65 0.58139535 0.61904762 0.70588235] mean value: 0.6458524437498806 key: train_jcc value: [0.75366569 0.72647059 0.73020528 0.74561404 0.73214286 0.72886297 0.74480712 0.7537092 0.74480712 0.74277457] mean value: 0.7403059430579226 MCC on Blind test: 0.41 Accuracy on Blind test: 0.71 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.91805243 2.02247119 1.97110653 1.8196218 2.15600729 2.25056601 2.00501966 2.01817799 1.94140291 2.08676553] mean value: 2.0189191341400146 key: score_time value: [0.01340175 0.0231154 0.01601505 0.01246262 0.0241096 0.02291393 0.01671243 0.01519299 0.01519251 0.01688886] mean value: 0.017600512504577635 key: test_mcc value: [0.50097943 0.75592895 0.65657067 0.71910121 0.57258185 0.61103872 0.55611985 0.68415777 0.66853948 0.65315611] mean value: 0.6378174031294251 key: train_mcc value: [0.96857012 0.95087719 0.95453287 0.95146307 0.98949822 0.96176174 0.96497362 0.94760416 0.96218292 0.96161729] mean value: 0.9613081218627331 key: test_accuracy value: [0.75 0.875 0.828125 0.859375 0.77777778 0.79365079 0.77777778 0.84126984 0.82539683 0.82539683] mean value: 0.8153769841269841 key: train_accuracy value: [0.98421053 0.9754386 0.97719298 0.9754386 0.99474606 0.98073555 0.98248687 0.9737303 0.98073555 0.98073555] mean value: 0.9805450579162442 key: test_fscore value: [0.75757576 0.88235294 0.82539683 0.86153846 0.80555556 0.82191781 0.78787879 0.84375 0.84057971 0.81355932] mean value: 0.8240105169519862 key: train_fscore value: [0.98434783 0.9754386 0.9773913 0.97586207 0.99474606 0.98093588 0.98245614 0.97400347 0.98113208 0.98093588] mean value: 0.9807249287896543 key: test_precision value: [0.73529412 0.83333333 0.83870968 0.84848485 0.725 0.73170732 0.76470588 0.81818182 0.76315789 0.85714286] mean value: 0.7915717746372225 key: train_precision value: [0.97586207 0.9754386 0.96896552 0.95932203 0.99300699 0.96917808 0.98245614 0.96563574 0.96296296 0.97250859] mean value: 0.9725336725005951 key: test_recall value: [0.78125 0.9375 0.8125 0.875 0.90625 0.9375 0.8125 0.87096774 0.93548387 0.77419355] mean value: 0.8643145161290322 key: train_recall value: [0.99298246 0.9754386 0.98596491 0.99298246 0.99649123 0.99298246 0.98245614 0.98251748 1. 0.98951049] mean value: 0.9891326217642007 key: test_roc_auc value: [0.75 0.875 0.828125 0.859375 0.77570565 0.79133065 0.77721774 0.84173387 0.82711694 0.82459677] mean value: 0.8150201612903226 key: train_roc_auc value: [0.98421053 0.9754386 0.97719298 0.9754386 0.99474911 0.98075696 0.98248681 0.97371488 0.98070175 0.98072016] mean value: 0.9805410379094589 key: test_jcc value: [0.6097561 0.78947368 0.7027027 0.75675676 0.6744186 0.69767442 0.65 0.72972973 0.725 0.68571429] mean value: 0.7021226279930791 key: train_jcc value: [0.96917808 0.95205479 0.95578231 0.95286195 0.98954704 0.96258503 0.96551724 0.94932432 0.96296296 0.96258503] mean value: 0.9622398777520786 MCC on Blind test: 0.4 Accuracy on Blind test: 0.71 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.05209064 0.03377962 0.03262115 0.03452277 0.03428578 0.03552485 0.03424144 0.03228545 0.03130507 0.03644228] mean value: 0.03570990562438965 key: score_time value: [0.00966406 0.00889969 0.00893044 0.00901508 0.00900245 0.00888991 0.00902891 0.00895739 0.00901842 0.00905275] mean value: 0.009045910835266114 key: test_mcc value: [0.75592895 0.9375 0.75 0.78163175 0.93844649 0.68352185 0.68245968 0.76058095 0.62475802 0.78160117] mean value: 0.7696428848610658 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.875 0.96875 0.875 0.890625 0.96825397 0.84126984 0.84126984 0.87301587 0.80952381 0.88888889] mean value: 0.8831597222222223 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.86666667 0.96875 0.875 0.89230769 0.96774194 0.84848485 0.84375 0.88235294 0.81818182 0.89230769] mean value: 0.8855543594609059 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.92857143 0.96875 0.875 0.87878788 1. 0.82352941 0.84375 0.81081081 0.77142857 0.85294118] mean value: 0.8753569277833984 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.8125 0.96875 0.875 0.90625 0.9375 0.875 0.84375 0.96774194 0.87096774 0.93548387] mean value: 0.8992943548387097 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.875 0.96875 0.875 0.890625 0.96875 0.84072581 0.84122984 0.87449597 0.81048387 0.88961694] mean value: 0.8834677419354838 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.76470588 0.93939394 0.77777778 0.80555556 0.9375 0.73684211 0.72972973 0.78947368 0.69230769 0.80555556] mean value: 0.7978841922146875 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.63 Accuracy on Blind test: 0.82 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.13985062 0.14144874 0.14082766 0.14119935 0.13991904 0.13978195 0.14032125 0.13998294 0.14152503 0.14098597] mean value: 0.14058425426483154 key: score_time value: [0.01828051 0.01824522 0.01849365 0.01862502 0.01824903 0.01838708 0.01830077 0.01825333 0.01820087 0.01816535] mean value: 0.018320083618164062 key: test_mcc value: [0.625 0.78163175 0.62622429 0.78163175 0.58728587 0.65821474 0.61982085 0.59049817 0.4969666 0.72270545] mean value: 0.648997946446946 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8125 0.890625 0.8125 0.890625 0.79365079 0.82539683 0.80952381 0.79365079 0.74603175 0.85714286] mean value: 0.8231646825396826 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8125 0.88888889 0.80645161 0.88888889 0.8 0.84057971 0.81818182 0.8 0.75757576 0.84210526] mean value: 0.8255171939741401 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8125 0.90322581 0.83333333 0.90322581 0.78787879 0.78378378 0.79411765 0.76470588 0.71428571 0.92307692] mean value: 0.8220133684673533 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.8125 0.875 0.78125 0.875 0.8125 0.90625 0.84375 0.83870968 0.80645161 0.77419355] mean value: 0.8325604838709677 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8125 0.890625 0.8125 0.890625 0.79334677 0.82409274 0.80897177 0.79435484 0.74697581 0.85584677] mean value: 0.822983870967742 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.68421053 0.8 0.67567568 0.8 0.66666667 0.725 0.69230769 0.66666667 0.6097561 0.72727273] mean value: 0.7047556052466194 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.4 Accuracy on Blind test: 0.71 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01070738 0.01049376 0.01071596 0.01058316 0.01123023 0.01092839 0.01083207 0.01064515 0.01091695 0.01072025] mean value: 0.01077733039855957 key: score_time value: [0.00892258 0.00890732 0.0089016 0.00887442 0.00891399 0.00899148 0.00892997 0.00894594 0.00897908 0.00902557] mean value: 0.008939194679260253 key: test_mcc value: [0.438357 0.78470603 0.474579 0.53150959 0.49193548 0.49193548 0.61982085 0.30914596 0.52419355 0.42986904] mean value: 0.5096051977610179 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.71875 0.890625 0.734375 0.765625 0.74603175 0.74603175 0.80952381 0.65079365 0.76190476 0.71428571] mean value: 0.7537946428571428 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.70967742 0.89552239 0.75362319 0.76923077 0.75 0.75 0.81818182 0.67647059 0.76190476 0.71875 ] mean value: 0.760336093337298 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.73333333 0.85714286 0.7027027 0.75757576 0.75 0.75 0.79411765 0.62162162 0.75 0.6969697 ] mean value: 0.7413463616404793 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.6875 0.9375 0.8125 0.78125 0.75 0.75 0.84375 0.74193548 0.77419355 0.74193548] mean value: 0.7820564516129033 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.71875 0.890625 0.734375 0.765625 0.74596774 0.74596774 0.80897177 0.65221774 0.76209677 0.71471774] mean value: 0.7539314516129032 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.55 0.81081081 0.60465116 0.625 0.6 0.6 0.69230769 0.51111111 0.61538462 0.56097561] mean value: 0.6170241002161025 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.23 Accuracy on Blind test: 0.62 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [2.07888103 2.10924435 2.08753395 2.09503651 2.09699631 2.11549616 2.16317201 2.11698031 2.08939385 2.10804057] mean value: 2.10607750415802 key: score_time value: [0.09666562 0.09476447 0.09688997 0.09551358 0.10052085 0.10555124 0.09454823 0.10372686 0.09959555 0.09707618] mean value: 0.09848525524139404 key: test_mcc value: [0.8125 0.96922337 0.8125 0.84416229 0.78094752 0.75156646 0.58770161 0.87487431 0.68865372 0.87462485] mean value: 0.7996754145177033 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.90625 0.984375 0.90625 0.921875 0.88888889 0.87301587 0.79365079 0.93650794 0.84126984 0.93650794] mean value: 0.898859126984127 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90625 0.98461538 0.90625 0.92307692 0.89552239 0.88235294 0.79365079 0.9375 0.84848485 0.93333333] mean value: 0.9011036612397455 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.90625 0.96969697 0.90625 0.90909091 0.85714286 0.83333333 0.80645161 0.90909091 0.8 0.96551724] mean value: 0.8862823832637514 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90625 1. 0.90625 0.9375 0.9375 0.9375 0.78125 0.96774194 0.90322581 0.90322581] mean value: 0.9180443548387097 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90625 0.984375 0.90625 0.921875 0.88810484 0.87197581 0.79385081 0.93699597 0.8422379 0.9359879 ] mean value: 0.8987903225806452 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.82857143 0.96969697 0.82857143 0.85714286 0.81081081 0.78947368 0.65789474 0.88235294 0.73684211 0.875 ] mean value: 0.8236356962285755 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.66 Accuracy on Blind test: 0.83 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.02598143 1.0367229 0.99699593 0.98521328 1.00284791 0.98103404 1.01575923 1.06608391 1.04886723 1.05637908] mean value: 1.0215884923934937 key: score_time value: [0.26507068 0.28202915 0.27169871 0.27197194 0.25234628 0.2420013 0.26003218 0.26900506 0.314147 0.15186691] mean value: 0.25801692008972166 key: test_mcc value: [0.78470603 0.9375 0.81409158 0.84416229 0.81092385 0.78719616 0.61982085 0.78822824 0.68865372 0.87462485] mean value: 0.7949907576810107 key: train_mcc value: [0.93704978 0.92659532 0.9340293 0.93064988 0.94479673 0.92690929 0.93015524 0.9341391 0.94115006 0.92350399] mean value: 0.9328978692400133 key: test_accuracy value: [0.890625 0.96875 0.90625 0.921875 0.9047619 0.88888889 0.80952381 0.88888889 0.84126984 0.93650794] mean value: 0.895734126984127 key: train_accuracy value: [0.96842105 0.96315789 0.96666667 0.96491228 0.97197898 0.96322242 0.96497373 0.96672504 0.97022767 0.9614711 ] mean value: 0.9661756843948751 key: test_fscore value: [0.8852459 0.96875 0.90322581 0.92307692 0.90909091 0.89855072 0.81818182 0.89552239 0.84848485 0.93333333] mean value: 0.8983462652956172 key: train_fscore value: [0.96875 0.96360485 0.96729776 0.96563574 0.97250859 0.96373057 0.96527778 0.96740995 0.97084048 0.96219931] mean value: 0.9667255034318909 key: test_precision value: [0.93103448 0.96875 0.93333333 0.90909091 0.88235294 0.83783784 0.79411765 0.83333333 0.8 0.96551724] mean value: 0.8855367725968639 key: train_precision value: [0.95876289 0.95205479 0.94932432 0.94612795 0.95286195 0.94897959 0.95532646 0.94949495 0.95286195 0.94594595] mean value: 0.9511740805053392 key: test_recall value: [0.84375 0.96875 0.875 0.9375 0.9375 0.96875 0.84375 0.96774194 0.90322581 0.90322581] mean value: 0.9149193548387097 key: train_recall value: [0.97894737 0.9754386 0.98596491 0.98596491 0.99298246 0.97894737 0.9754386 0.98601399 0.98951049 0.97902098] mean value: 0.982822966507177 key: test_roc_auc value: [0.890625 0.96875 0.90625 0.921875 0.90423387 0.88760081 0.80897177 0.89012097 0.8422379 0.9359879 ] mean value: 0.8956653225806451 key: train_roc_auc value: [0.96842105 0.96315789 0.96666667 0.96491228 0.9720157 0.96324991 0.96499203 0.9666912 0.97019384 0.96144031] mean value: 0.9661740890688258 key: test_jcc value: [0.79411765 0.93939394 0.82352941 0.85714286 0.83333333 0.81578947 0.69230769 0.81081081 0.73684211 0.875 ] mean value: 0.817826727075953 key: train_jcc value: [0.93939394 0.92976589 0.93666667 0.93355482 0.94648829 0.93 0.93288591 0.93687708 0.94333333 0.92715232] mean value: 0.9356118237604717 MCC on Blind test: 0.69 Accuracy on Blind test: 0.85 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0240097 0.0106535 0.01081276 0.01077318 0.01083565 0.01066613 0.01063871 0.01072431 0.01179624 0.01075387] mean value: 0.012166404724121093 key: score_time value: [0.00999904 0.0090971 0.00928092 0.009094 0.00921273 0.00915384 0.00908613 0.00915599 0.00989032 0.00916982] mean value: 0.00931398868560791 key: test_mcc value: [0.53150959 0.50395263 0.21971769 0.44539933 0.49493401 0.52419355 0.46146899 0.37363667 0.33366935 0.58770161] mean value: 0.4476183430779835 key: train_mcc value: [0.55284089 0.54886043 0.51932702 0.52686418 0.54489338 0.48908468 0.55937838 0.5535563 0.52463264 0.52543149] mean value: 0.5344869395135066 key: test_accuracy value: [0.765625 0.75 0.609375 0.71875 0.74603175 0.76190476 0.73015873 0.68253968 0.66666667 0.79365079] mean value: 0.7224702380952381 key: train_accuracy value: [0.7754386 0.77368421 0.75964912 0.76315789 0.77232925 0.74430823 0.7793345 0.77583187 0.76182137 0.76182137] mean value: 0.7667376409500107 key: test_fscore value: [0.76190476 0.76470588 0.59016393 0.74285714 0.76470588 0.76190476 0.74626866 0.70588235 0.66666667 0.79365079] mean value: 0.7298710835773833 key: train_fscore value: [0.78451178 0.78172589 0.7609075 0.76843911 0.77508651 0.74914089 0.78424658 0.7852349 0.76949153 0.77181208] mean value: 0.7730596764554477 key: test_precision value: [0.77419355 0.72222222 0.62068966 0.68421053 0.72222222 0.77419355 0.71428571 0.64864865 0.65625 0.78125 ] mean value: 0.7098166085641204 key: train_precision value: [0.75404531 0.75490196 0.75694444 0.75167785 0.76450512 0.73400673 0.76588629 0.75483871 0.74671053 0.74193548] mean value: 0.7525452425971371 key: test_recall value: [0.75 0.8125 0.5625 0.8125 0.8125 0.75 0.78125 0.77419355 0.67741935 0.80645161] mean value: 0.7539314516129032 key: train_recall value: [0.81754386 0.81052632 0.76491228 0.78596491 0.78596491 0.76491228 0.80350877 0.81818182 0.79370629 0.8041958 ] mean value: 0.794941724941725 key: test_roc_auc value: [0.765625 0.75 0.609375 0.71875 0.74495968 0.76209677 0.72933468 0.68397177 0.66683468 0.79385081] mean value: 0.7224798387096774 key: train_roc_auc value: [0.7754386 0.77368421 0.75964912 0.76315789 0.77235309 0.74434425 0.77937676 0.77575758 0.76176543 0.76174702] mean value: 0.766727395411606 key: test_jcc value: [0.61538462 0.61904762 0.41860465 0.59090909 0.61904762 0.61538462 0.5952381 0.54545455 0.5 0.65789474] mean value: 0.5776965588471097 key: train_jcc value: [0.64542936 0.64166667 0.61408451 0.62395543 0.63276836 0.5989011 0.64507042 0.64640884 0.62534435 0.6284153 ] mean value: 0.6302044344305446 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.10538983 0.1000948 0.09575319 0.09839797 0.26285148 0.09676743 0.10008001 0.10182142 0.10213518 0.10413456] mean value: 0.11674258708953858 key: score_time value: [0.01122642 0.01133132 0.0112431 0.01126313 0.01146555 0.01128602 0.01158309 0.01123977 0.01124072 0.01122522] mean value: 0.011310434341430664 key: test_mcc value: [0.875 0.96922337 0.84416229 0.84416229 0.90524194 0.78094752 0.68415777 0.84530217 0.68865372 1. ] mean value: 0.8436851064509476 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9375 0.984375 0.921875 0.921875 0.95238095 0.88888889 0.84126984 0.92063492 0.84126984 1. ] mean value: 0.9210069444444444 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.9375 0.98461538 0.92063492 0.92307692 0.95238095 0.89552239 0.83870968 0.92307692 0.84848485 1. ] mean value: 0.9224002017749009 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.9375 0.96969697 0.93548387 0.90909091 0.96774194 0.85714286 0.86666667 0.88235294 0.8 1. ] mean value: 0.9125676150225486 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.9375 1. 0.90625 0.9375 0.9375 0.9375 0.8125 0.96774194 0.90322581 1. ] mean value: 0.9339717741935484 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9375 0.984375 0.921875 0.921875 0.95262097 0.88810484 0.84173387 0.92137097 0.8422379 1. ] mean value: 0.9211693548387097 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88235294 0.96969697 0.85294118 0.85714286 0.90909091 0.81081081 0.72222222 0.85714286 0.73684211 1. ] mean value: 0.8598242849016843 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.74 Accuracy on Blind test: 0.87 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04755354 0.09139705 0.07708144 0.09074998 0.06661797 0.10064197 0.05450773 0.06642532 0.05162716 0.04805136] mean value: 0.06946535110473633 key: score_time value: [0.01876473 0.01898789 0.01507235 0.01241088 0.01922154 0.01299763 0.02347302 0.01298523 0.02243018 0.02135372] mean value: 0.017769718170166017 key: test_mcc value: [0.47082362 0.67253825 0.438357 0.73658951 0.6385282 0.68740835 0.71443023 0.66853948 0.51058887 0.68245968] mean value: 0.62202631854168 key: train_mcc value: [0.82329065 0.83911319 0.81853204 0.81252947 0.84607646 0.83921607 0.80796777 0.81578865 0.83960889 0.82280711] mean value: 0.8264930306653335 key: test_accuracy value: [0.734375 0.828125 0.71875 0.859375 0.80952381 0.84126984 0.85714286 0.82539683 0.74603175 0.84126984] mean value: 0.806125992063492 key: train_accuracy value: [0.91052632 0.91929825 0.90877193 0.90526316 0.92294221 0.91943958 0.90367776 0.90718039 0.91943958 0.91068301] mean value: 0.9127222171014225 key: test_fscore value: [0.72131148 0.84507042 0.70967742 0.84210526 0.83333333 0.85294118 0.86153846 0.84057971 0.77142857 0.83870968] mean value: 0.8116695510793017 key: train_fscore value: [0.91370558 0.92068966 0.9109589 0.90847458 0.92361111 0.92041522 0.90533563 0.91001698 0.92123288 0.91341256] mean value: 0.9147853101869589 key: test_precision value: [0.75862069 0.76923077 0.73333333 0.96 0.75 0.80555556 0.84848485 0.76315789 0.69230769 0.83870968] mean value: 0.7919400460723568 key: train_precision value: [0.88235294 0.90508475 0.88963211 0.87868852 0.91408935 0.90784983 0.88851351 0.88448845 0.90268456 0.88778878] mean value: 0.8941172799978007 key: test_recall value: [0.6875 0.9375 0.6875 0.75 0.9375 0.90625 0.875 0.93548387 0.87096774 0.83870968] mean value: 0.8426411290322581 key: train_recall value: [0.94736842 0.93684211 0.93333333 0.94035088 0.93333333 0.93333333 0.92280702 0.93706294 0.94055944 0.94055944] mean value: 0.936555023923445 key: test_roc_auc value: [0.734375 0.828125 0.71875 0.859375 0.80745968 0.84022177 0.85685484 0.82711694 0.74798387 0.84122984] mean value: 0.8061491935483871 key: train_roc_auc value: [0.91052632 0.91929825 0.90877193 0.90526316 0.92296037 0.91946387 0.9037112 0.90712796 0.91940253 0.9106306 ] mean value: 0.9127156177156177 key: test_jcc value: [0.56410256 0.73170732 0.55 0.72727273 0.71428571 0.74358974 0.75675676 0.725 0.62790698 0.72222222] mean value: 0.6862844022047085 key: train_jcc value: [0.8411215 0.85303514 0.83647799 0.83229814 0.85806452 0.8525641 0.82704403 0.83489097 0.85396825 0.840625 ] mean value: 0.8430089626715126 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01229477 0.01073599 0.01056314 0.01035595 0.01048636 0.01037979 0.01035643 0.0102849 0.01053333 0.01048923] mean value: 0.010647988319396973 key: score_time value: [0.010355 0.0092082 0.00895095 0.00888205 0.00895739 0.00913954 0.00916409 0.00913596 0.00908208 0.00891948] mean value: 0.009179472923278809 key: test_mcc value: [0.51639778 0.50395263 0.40644851 0.65657067 0.57258185 0.59372402 0.4647426 0.37363667 0.4969666 0.68245968] mean value: 0.5267481010393584 key: train_mcc value: [0.54984141 0.56727781 0.53204317 0.56144502 0.55001067 0.51487506 0.5477723 0.5887123 0.58703803 0.55797288] mean value: 0.5556988650980035 key: test_accuracy value: [0.75 0.75 0.703125 0.828125 0.77777778 0.79365079 0.73015873 0.68253968 0.74603175 0.84126984] mean value: 0.7602678571428572 key: train_accuracy value: [0.77368421 0.78245614 0.76491228 0.77894737 0.77408056 0.75656743 0.77232925 0.79334501 0.7915937 0.77758319] mean value: 0.7765499124343258 key: test_fscore value: [0.77777778 0.76470588 0.70769231 0.83076923 0.80555556 0.8115942 0.75362319 0.70588235 0.75757576 0.83870968] mean value: 0.7753885933388449 key: train_fscore value: [0.7839196 0.79194631 0.77516779 0.79069767 0.78246206 0.76559865 0.78333333 0.80201342 0.80330579 0.78868552] mean value: 0.7867130140033903 key: test_precision value: [0.7 0.72222222 0.6969697 0.81818182 0.725 0.75675676 0.7027027 0.64864865 0.71428571 0.83870968] mean value: 0.7323477237186915 key: train_precision value: [0.75 0.75884244 0.74276527 0.75078864 0.75324675 0.73701299 0.74603175 0.77096774 0.76175549 0.75238095] mean value: 0.7523792027076264 key: test_recall value: [0.875 0.8125 0.71875 0.84375 0.90625 0.875 0.8125 0.77419355 0.80645161 0.83870968] mean value: 0.8263104838709677 key: train_recall value: [0.82105263 0.82807018 0.81052632 0.83508772 0.81403509 0.79649123 0.8245614 0.83566434 0.84965035 0.82867133] mean value: 0.8243810575389523 key: test_roc_auc value: [0.75 0.75 0.703125 0.828125 0.77570565 0.79233871 0.72883065 0.68397177 0.74697581 0.84122984] mean value: 0.7600302419354839 key: train_roc_auc value: [0.77368421 0.78245614 0.76491228 0.77894737 0.77415041 0.75663722 0.77242056 0.79327076 0.79149184 0.77749356] mean value: 0.7765464360201202 key: test_jcc value: [0.63636364 0.61904762 0.54761905 0.71052632 0.6744186 0.68292683 0.60465116 0.54545455 0.6097561 0.72222222] mean value: 0.6352986080767673 key: train_jcc value: [0.6446281 0.65555556 0.63287671 0.65384615 0.64265928 0.62021858 0.64383562 0.66946779 0.67127072 0.6510989 ] mean value: 0.6485457402801543 MCC on Blind test: 0.42 Accuracy on Blind test: 0.71 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01563811 0.02035022 0.02358508 0.02702594 0.0256815 0.02155948 0.02621078 0.02639723 0.02369452 0.02356243] mean value: 0.023370528221130372 key: score_time value: [0.01053476 0.01212144 0.01214385 0.0119803 0.01212549 0.01194334 0.01219034 0.01225972 0.01224947 0.01227522] mean value: 0.011982393264770509 key: test_mcc value: [0.55603844 0.6011334 0.4163332 0.75146915 0.41887185 0.56796183 0.5180609 0.50132936 0.46068548 0.56086231] mean value: 0.5352745927373167 key: train_mcc value: [0.54890295 0.70877629 0.70214646 0.73373289 0.72976328 0.45009303 0.58267032 0.66545422 0.74097799 0.67202979] mean value: 0.6534547204028045 key: test_accuracy value: [0.75 0.796875 0.703125 0.875 0.68253968 0.74603175 0.71428571 0.71428571 0.73015873 0.77777778] mean value: 0.7490079365079365 key: train_accuracy value: [0.73684211 0.85438596 0.84912281 0.86140351 0.85464098 0.67250438 0.75656743 0.81085814 0.8704028 0.81611208] mean value: 0.8082840200325683 key: test_fscore value: [0.79487179 0.77966102 0.66666667 0.87878788 0.75 0.8 0.7804878 0.76923077 0.73015873 0.78787879] mean value: 0.7737743449421829 key: train_fscore value: [0.78991597 0.85464098 0.84074074 0.8723748 0.86970173 0.75166003 0.80283688 0.83976261 0.86925795 0.84304933] mean value: 0.8333941007922129 key: test_precision value: [0.67391304 0.85185185 0.76 0.85294118 0.625 0.66666667 0.64 0.63829787 0.71875 0.74285714] mean value: 0.7170277753664936 key: train_precision value: [0.65734266 0.85314685 0.89019608 0.80838323 0.78693182 0.60470085 0.67380952 0.72938144 0.87857143 0.73629243] mean value: 0.7618756319214844 key: test_recall value: [0.96875 0.71875 0.59375 0.90625 0.9375 1. 1. 0.96774194 0.74193548 0.83870968] mean value: 0.8673387096774193 key: train_recall value: [0.98947368 0.85614035 0.79649123 0.94736842 0.97192982 0.99298246 0.99298246 0.98951049 0.86013986 0.98601399] mean value: 0.9383032756716967 key: test_roc_auc value: [0.75 0.796875 0.703125 0.875 0.67842742 0.74193548 0.70967742 0.71824597 0.73034274 0.77872984] mean value: 0.7482358870967742 key: train_roc_auc value: [0.73684211 0.85438596 0.84912281 0.86140351 0.85484603 0.67306465 0.75698074 0.81054472 0.87042081 0.81581401] mean value: 0.8083425346583241 key: test_jcc value: [0.65957447 0.63888889 0.5 0.78378378 0.6 0.66666667 0.64 0.625 0.575 0.65 ] mean value: 0.6338913807424446 key: train_jcc value: [0.65277778 0.74617737 0.72523962 0.77363897 0.76944444 0.60212766 0.67061611 0.72378517 0.76875 0.72868217] mean value: 0.7161239287449186 MCC on Blind test: 0.5 Accuracy on Blind test: 0.7 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.03134894 0.02866793 0.02440834 0.02831697 0.03325534 0.02925897 0.0286274 0.02725267 0.02873611 0.02479219] mean value: 0.02846648693084717 key: score_time value: [0.01226878 0.01199651 0.01198626 0.01231027 0.01204038 0.012254 0.01232457 0.01247358 0.01225519 0.01198483] mean value: 0.012189435958862304 key: test_mcc value: [0.49916874 0.64549722 0.31311215 0.78470603 0.54443762 0.62939541 0.59049817 0.56126657 0.36114822 0.49283288] mean value: 0.5422063009133407 key: train_mcc value: [0.62861856 0.78488853 0.75958458 0.78488853 0.78400945 0.7560971 0.75332186 0.7666719 0.57667378 0.48335361] mean value: 0.7078107890020772 key: test_accuracy value: [0.734375 0.8125 0.65625 0.890625 0.76190476 0.80952381 0.79365079 0.76190476 0.65079365 0.6984127 ] mean value: 0.7569940476190476 key: train_accuracy value: [0.79473684 0.88947368 0.87894737 0.88947368 0.88966725 0.87390543 0.87565674 0.87565674 0.76007005 0.69702277] mean value: 0.8424610563185547 key: test_fscore value: [0.67924528 0.83333333 0.64516129 0.8852459 0.79452055 0.82857143 0.78688525 0.79452055 0.5 0.55813953] mean value: 0.7305623113561326 key: train_fscore value: [0.75159236 0.89586777 0.88285229 0.89586777 0.89517471 0.88235294 0.87067395 0.88712242 0.69487751 0.57493857] mean value: 0.8231320285575312 key: test_precision value: [0.85714286 0.75 0.66666667 0.93103448 0.70731707 0.76315789 0.82758621 0.69047619 0.84615385 1. ] mean value: 0.8039535218002306 key: train_precision value: [0.9516129 0.846875 0.85526316 0.846875 0.85126582 0.82568807 0.90530303 0.81341108 0.95705521 0.96694215] mean value: 0.8820291429804338 key: test_recall value: [0.5625 0.9375 0.625 0.84375 0.90625 0.90625 0.75 0.93548387 0.35483871 0.38709677] mean value: 0.7208669354838709 key: train_recall value: [0.62105263 0.95087719 0.9122807 0.95087719 0.94385965 0.94736842 0.83859649 0.97552448 0.54545455 0.40909091] mean value: 0.8094982210771684 key: test_roc_auc value: [0.734375 0.8125 0.65625 0.890625 0.75957661 0.80796371 0.79435484 0.76461694 0.64616935 0.69354839] mean value: 0.7559979838709677 key: train_roc_auc value: [0.79473684 0.88947368 0.87894737 0.88947368 0.88976199 0.87403386 0.87559195 0.87548154 0.76044657 0.69752791] mean value: 0.8425475401791191 key: test_jcc value: [0.51428571 0.71428571 0.47619048 0.79411765 0.65909091 0.70731707 0.64864865 0.65909091 0.33333333 0.38709677] mean value: 0.5893457199348808 key: train_jcc value: [0.60204082 0.81137725 0.79027356 0.81137725 0.81024096 0.78947368 0.77096774 0.79714286 0.53242321 0.40344828] mean value: 0.7118765594772982 MCC on Blind test: 0.51 Accuracy on Blind test: 0.73 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.21471906 0.19875097 0.20195174 0.19802856 0.19768119 0.19887018 0.1979003 0.2040205 0.20287895 0.20416617] mean value: 0.2018967628479004 key: score_time value: [0.01584053 0.01595926 0.01975155 0.01568866 0.01623011 0.015908 0.01558733 0.0160017 0.01708651 0.01687288] mean value: 0.016492652893066406 key: test_mcc value: [0.81409158 0.875 0.78163175 0.84416229 0.87487431 0.78094752 0.61982085 0.73343622 0.71790017 0.8415746 ] mean value: 0.7883439285886955 key: train_mcc value: [0.93704978 0.94406888 0.95108798 0.95097086 0.94760737 0.94416837 0.96862577 0.94404909 0.95117136 0.95817844] mean value: 0.949697790480524 key: test_accuracy value: [0.90625 0.9375 0.890625 0.921875 0.93650794 0.88888889 0.80952381 0.85714286 0.85714286 0.92063492] mean value: 0.892609126984127 key: train_accuracy value: [0.96842105 0.97192982 0.9754386 0.9754386 0.9737303 0.97197898 0.98423818 0.97197898 0.97548161 0.97898424] mean value: 0.9747620364396105 key: test_fscore value: [0.90909091 0.9375 0.89230769 0.92307692 0.93548387 0.89552239 0.81818182 0.86956522 0.86153846 0.91803279] mean value: 0.8960300067499798 key: train_fscore value: [0.96875 0.97222222 0.97569444 0.97560976 0.97391304 0.97222222 0.98434783 0.97222222 0.97577855 0.97923875] mean value: 0.9749999037811952 key: test_precision value: [0.88235294 0.9375 0.87878788 0.90909091 0.96666667 0.85714286 0.79411765 0.78947368 0.82352941 0.93333333] mean value: 0.8771995329232172 key: train_precision value: [0.95876289 0.96219931 0.96563574 0.96885813 0.96551724 0.96219931 0.97586207 0.96551724 0.96575342 0.96917808] mean value: 0.9659483440920449 key: test_recall value: [0.9375 0.9375 0.90625 0.9375 0.90625 0.9375 0.84375 0.96774194 0.90322581 0.90322581] mean value: 0.9180443548387097 key: train_recall value: [0.97894737 0.98245614 0.98596491 0.98245614 0.98245614 0.98245614 0.99298246 0.97902098 0.98601399 0.98951049] mean value: 0.9842264752791068 key: test_roc_auc value: [0.90625 0.9375 0.890625 0.921875 0.93699597 0.88810484 0.80897177 0.85887097 0.8578629 0.9203629 ] mean value: 0.8927419354838709 key: train_roc_auc value: [0.96842105 0.97192982 0.9754386 0.9754386 0.97374555 0.9719973 0.98425347 0.97196663 0.97546313 0.97896577] mean value: 0.9747619923935713 key: test_jcc value: [0.83333333 0.88235294 0.80555556 0.85714286 0.87878788 0.81081081 0.69230769 0.76923077 0.75675676 0.84848485] mean value: 0.8134763443586973 key: train_jcc value: [0.93939394 0.94594595 0.95254237 0.95238095 0.94915254 0.94594595 0.96917808 0.94594595 0.9527027 0.95932203] mean value: 0.9512510463659756 MCC on Blind test: 0.67 Accuracy on Blind test: 0.84 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.08887267 0.07713723 0.12714386 0.11457253 0.09450245 0.09540319 0.12129784 0.11899137 0.11339378 0.13355446] mean value: 0.1084869384765625 key: score_time value: [0.02319431 0.02583909 0.04017091 0.02797508 0.02424788 0.02266908 0.04040813 0.02550125 0.03998375 0.0358336 ] mean value: 0.030582308769226074 key: test_mcc value: [0.8125 0.90669283 0.75 0.81409158 0.87298387 0.72270545 0.68245968 0.84530217 0.68415777 0.93649194] mean value: 0.8027385275255405 key: train_mcc value: [0.98596491 0.98606204 0.98947978 0.98947978 0.98949809 0.98954653 0.99650345 0.98598945 0.98263804 0.98601347] mean value: 0.9881175538085738 key: test_accuracy value: [0.90625 0.953125 0.875 0.90625 0.93650794 0.85714286 0.84126984 0.92063492 0.84126984 0.96825397] mean value: 0.9005704365079366 key: train_accuracy value: [0.99298246 0.99298246 0.99473684 0.99473684 0.99474606 0.99474606 0.99824869 0.99299475 0.99124343 0.99299475] mean value: 0.9940412326788951 key: test_fscore value: [0.90625 0.95238095 0.875 0.90909091 0.9375 0.86956522 0.84375 0.92307692 0.84375 0.96774194] mean value: 0.902810593742396 key: train_fscore value: [0.99298246 0.99303136 0.99474606 0.99472759 0.99472759 0.99470899 0.99824253 0.99300699 0.99118166 0.99303136] mean value: 0.994038659430934 key: test_precision value: [0.90625 0.96774194 0.875 0.88235294 0.9375 0.81081081 0.84375 0.88235294 0.81818182 0.96774194] mean value: 0.8891682382313312 key: train_precision value: [0.99298246 0.98615917 0.99300699 0.99647887 0.99647887 1. 1. 0.99300699 1. 0.98958333] mean value: 0.9947696691516716 key: test_recall value: [0.90625 0.9375 0.875 0.9375 0.9375 0.9375 0.84375 0.96774194 0.87096774 0.96774194] mean value: 0.9181451612903226 key: train_recall value: [0.99298246 1. 0.99649123 0.99298246 0.99298246 0.98947368 0.99649123 0.99300699 0.98251748 0.9965035 ] mean value: 0.9933431480799901 key: test_roc_auc value: [0.90625 0.953125 0.875 0.90625 0.93649194 0.85584677 0.84122984 0.92137097 0.84173387 0.96824597] mean value: 0.9005544354838709 key: train_roc_auc value: [0.99298246 0.99298246 0.99473684 0.99473684 0.99474298 0.99473684 0.99824561 0.99299472 0.99125874 0.99298859] mean value: 0.9940406085142927 key: test_jcc value: [0.82857143 0.90909091 0.77777778 0.83333333 0.88235294 0.76923077 0.72972973 0.85714286 0.72972973 0.9375 ] mean value: 0.8254459475783005 key: train_jcc value: [0.98606272 0.98615917 0.98954704 0.98951049 0.98951049 0.98947368 0.99649123 0.98611111 0.98251748 0.98615917] mean value: 0.9881542580128181 MCC on Blind test: 0.64 Accuracy on Blind test: 0.82 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.25840831 0.23501658 0.19894886 0.22410798 0.22995543 0.23805594 0.23612881 0.2686615 0.23128748 0.22726607] mean value: 0.2347836971282959 key: score_time value: [0.02715898 0.01644039 0.02714419 0.02729154 0.02735209 0.02731228 0.02736306 0.02728319 0.02738619 0.02738714] mean value: 0.02621190547943115 key: test_mcc value: [0.56360186 0.69293487 0.4375 0.65657067 0.40473508 0.5253647 0.52371369 0.48255984 0.48255984 0.58728587] mean value: 0.5356826409991767 key: train_mcc value: [0.96169363 0.96512618 0.96857012 0.9615515 0.95475466 0.96176174 0.96556818 0.95493785 0.96862386 0.95493785] mean value: 0.9617525571921755 key: test_accuracy value: [0.78125 0.84375 0.71875 0.828125 0.6984127 0.76190476 0.76190476 0.73015873 0.73015873 0.79365079] mean value: 0.7648065476190475 key: train_accuracy value: [0.98070175 0.98245614 0.98421053 0.98070175 0.97723292 0.98073555 0.98248687 0.97723292 0.98423818 0.97723292] mean value: 0.9807229544965742 key: test_fscore value: [0.78787879 0.85294118 0.71875 0.83076923 0.73239437 0.7761194 0.76923077 0.76056338 0.76056338 0.78688525] mean value: 0.7776095739996653 key: train_fscore value: [0.98093588 0.98263889 0.98434783 0.98086957 0.97746967 0.98093588 0.98275862 0.97762478 0.98440208 0.97762478] mean value: 0.9809607971456844 key: test_precision value: [0.76470588 0.80555556 0.71875 0.81818182 0.66666667 0.74285714 0.75757576 0.675 0.675 0.8 ] mean value: 0.7424292823189882 key: train_precision value: [0.96917808 0.97250859 0.97586207 0.97241379 0.96575342 0.96917808 0.96610169 0.96271186 0.97594502 0.96271186] mean value: 0.9692364483086298 key: test_recall value: [0.8125 0.90625 0.71875 0.84375 0.8125 0.8125 0.78125 0.87096774 0.87096774 0.77419355] mean value: 0.8203629032258064 key: train_recall value: [0.99298246 0.99298246 0.99298246 0.98947368 0.98947368 0.99298246 1. 0.99300699 0.99300699 0.99300699] mean value: 0.9929898172003435 key: test_roc_auc value: [0.78125 0.84375 0.71875 0.828125 0.69657258 0.76108871 0.76159274 0.73235887 0.73235887 0.79334677] mean value: 0.7649193548387097 key: train_roc_auc value: [0.98070175 0.98245614 0.98421053 0.98070175 0.97725432 0.98075696 0.98251748 0.97720525 0.98422279 0.97720525] mean value: 0.9807232241442767 key: test_jcc value: [0.65 0.74358974 0.56097561 0.71052632 0.57777778 0.63414634 0.625 0.61363636 0.61363636 0.64864865] mean value: 0.6377937164297883 key: train_jcc value: [0.96258503 0.96587031 0.96917808 0.96245734 0.9559322 0.96258503 0.96610169 0.95622896 0.96928328 0.95622896] mean value: 0.9626450882483695 MCC on Blind test: 0.35 Accuracy on Blind test: 0.68 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.83188581 0.84436393 0.82656622 0.82991767 0.82706404 0.83091283 0.82206988 0.83148551 0.82596612 0.82123756] mean value: 0.8291469573974609 key: score_time value: [0.00976157 0.0102365 0.0104444 0.00963187 0.00991011 0.01015711 0.00961208 0.00998163 0.0098474 0.00953841] mean value: 0.009912109375 key: test_mcc value: [0.875 0.90669283 0.8125 0.84416229 0.87487431 0.81572458 0.71471774 0.88034084 0.72407013 0.96875 ] mean value: 0.8416832721490438 key: train_mcc value: [0.98952851 0.98947978 0.99649736 0.98596491 0.98601381 0.99301918 0.98949809 0.98949809 0.99299472 0.97898417] mean value: 0.9891478630296477 key: test_accuracy value: [0.9375 0.953125 0.90625 0.921875 0.93650794 0.9047619 0.85714286 0.93650794 0.85714286 0.98412698] mean value: 0.9194940476190476 key: train_accuracy value: [0.99473684 0.99473684 0.99824561 0.99298246 0.99299475 0.99649737 0.99474606 0.99474606 0.99649737 0.98949212] mean value: 0.9945675484683688 key: test_fscore value: [0.9375 0.95384615 0.90625 0.92063492 0.93548387 0.91176471 0.85714286 0.93939394 0.86567164 0.98412698] mean value: 0.9211815073785995 key: train_fscore value: [0.9947644 0.99474606 0.99824253 0.99298246 0.99300699 0.9965035 0.99472759 0.9947644 0.9965035 0.98951049] mean value: 0.9945751910043851 key: test_precision value: [0.9375 0.93939394 0.90625 0.93548387 0.96666667 0.86111111 0.87096774 0.88571429 0.80555556 0.96875 ] mean value: 0.9077393171344784 key: train_precision value: [0.98958333 0.99300699 1. 0.99298246 0.98954704 0.99303136 0.99647887 0.99303136 0.9965035 0.98951049] mean value: 0.993367539783166 key: test_recall value: [0.9375 0.96875 0.90625 0.90625 0.90625 0.96875 0.84375 1. 0.93548387 1. ] mean value: 0.9372983870967742 key: train_recall value: [1. 0.99649123 0.99649123 0.99298246 0.99649123 1. 0.99298246 0.9965035 0.9965035 0.98951049] mean value: 0.9957956079008711 key: test_roc_auc value: [0.9375 0.953125 0.90625 0.921875 0.93699597 0.90372984 0.85735887 0.9375 0.85836694 0.984375 ] mean value: 0.9197076612903226 key: train_roc_auc value: [0.99473684 0.99473684 0.99824561 0.99298246 0.99300086 0.9965035 0.99474298 0.99474298 0.99649736 0.98949209] mean value: 0.9945681511470985 key: test_jcc value: [0.88235294 0.91176471 0.82857143 0.85294118 0.87878788 0.83783784 0.75 0.88571429 0.76315789 0.96875 ] mean value: 0.8559878149177684 key: train_jcc value: [0.98958333 0.98954704 0.99649123 0.98606272 0.98611111 0.99303136 0.98951049 0.98958333 0.99303136 0.97923875] mean value: 0.9892190723551298 MCC on Blind test: 0.7 Accuracy on Blind test: 0.86 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03697062 0.0324316 0.0339098 0.03319502 0.0436852 0.05119801 0.05784035 0.06083322 0.03914094 0.03315282] mean value: 0.04223575592041016 key: score_time value: [0.01292467 0.01329732 0.01285863 0.01501918 0.026052 0.02982116 0.0223608 0.01836944 0.01538515 0.01521349] mean value: 0.018130183219909668 key: test_mcc value: [0.1242473 0.32163376 0.25819889 0.28347335 0.13433882 0.15715464 0.43960456 0.22008521 0.31933319 0.34495882] mean value: 0.2603028546854528 key: train_mcc value: [0.34935261 0.32679675 0.33007486 0.33333333 0.34206181 0.3515425 0.32590867 0.33683398 0.33033226 0.38353707] mean value: 0.340977383652551 key: test_accuracy value: [0.546875 0.59375 0.59375 0.59375 0.53968254 0.55555556 0.68253968 0.53968254 0.58730159 0.63492063] mean value: 0.5867807539682539 key: train_accuracy value: [0.60877193 0.59649123 0.59824561 0.6 0.60420315 0.60945709 0.59544658 0.60245184 0.59894921 0.62872154] mean value: 0.6042738193996374 key: test_fscore value: [0.65882353 0.71111111 0.69767442 0.70454545 0.68131868 0.68181818 0.75609756 0.68131868 0.70454545 0.71604938] mean value: 0.699330245636564 key: train_fscore value: [0.71878941 0.7125 0.71339174 0.71428571 0.7160804 0.71878941 0.71161049 0.71589487 0.71410737 0.72959184] mean value: 0.7165041228602924 key: test_precision value: [0.52830189 0.55172414 0.55555556 0.55357143 0.52542373 0.53571429 0.62 0.51666667 0.54385965 0.58 ] mean value: 0.5510817339167791 key: train_precision value: [0.56102362 0.55339806 0.55447471 0.55555556 0.55772994 0.56102362 0.55232558 0.55750487 0.55533981 0.57429719] mean value: 0.5582672956635221 key: test_recall value: [0.875 1. 0.9375 0.96875 0.96875 0.9375 0.96875 1. 1. 0.93548387] mean value: 0.9591733870967742 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.546875 0.59375 0.59375 0.59375 0.5327621 0.54939516 0.67792339 0.546875 0.59375 0.63961694] mean value: 0.5868447580645162 key: train_roc_auc value: [0.60877193 0.59649123 0.59824561 0.6 0.6048951 0.61013986 0.59615385 0.60175439 0.59824561 0.62807018] mean value: 0.6042767758557233 key: test_jcc value: [0.49122807 0.55172414 0.53571429 0.54385965 0.51666667 0.51724138 0.60784314 0.51666667 0.54385965 0.55769231] mean value: 0.5382495949657261 key: train_jcc value: [0.56102362 0.55339806 0.55447471 0.55555556 0.55772994 0.56102362 0.55232558 0.55750487 0.55533981 0.57429719] mean value: 0.5582672956635221 MCC on Blind test: 0.04 Accuracy on Blind test: 0.46 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.03842688 0.04042244 0.04043031 0.04043436 0.04095244 0.04054451 0.04072213 0.04055262 0.04073429 0.04077888] mean value: 0.0403998851776123 key: score_time value: [0.02156138 0.01931214 0.019135 0.01919627 0.01927257 0.01907492 0.01910901 0.01933813 0.01922321 0.01907396] mean value: 0.019429659843444823 key: test_mcc value: [0.59404013 0.69991324 0.50395263 0.69293487 0.56449867 0.62325024 0.65315611 0.57596915 0.63159952 0.68740835] mean value: 0.6226722903584234 key: train_mcc value: [0.79105298 0.77330677 0.77330677 0.78033683 0.79734868 0.78775886 0.78408981 0.78376086 0.78740717 0.7781152 ] mean value: 0.7836483929794602 key: test_accuracy value: [0.796875 0.84375 0.75 0.84375 0.77777778 0.80952381 0.82539683 0.77777778 0.80952381 0.84126984] mean value: 0.8075644841269841 key: train_accuracy value: [0.89473684 0.88596491 0.88596491 0.88947368 0.89842382 0.89316988 0.89141856 0.89141856 0.89316988 0.88791594] mean value: 0.8911656988355301 key: test_fscore value: [0.79365079 0.85714286 0.73333333 0.83333333 0.8 0.82352941 0.8358209 0.8 0.82352941 0.82758621] mean value: 0.812792624340867 key: train_fscore value: [0.89795918 0.88926746 0.88926746 0.89267462 0.9 0.89608177 0.89419795 0.89419795 0.89608177 0.89225589] mean value: 0.8941984063841519 key: test_precision value: [0.80645161 0.78947368 0.78571429 0.89285714 0.73684211 0.77777778 0.8 0.71794872 0.75675676 0.88888889] mean value: 0.795271097232048 key: train_precision value: [0.87128713 0.86423841 0.86423841 0.86754967 0.88474576 0.87086093 0.87043189 0.87333333 0.87375415 0.86038961] mean value: 0.870082929887785 key: test_recall value: [0.78125 0.9375 0.6875 0.78125 0.875 0.875 0.875 0.90322581 0.90322581 0.77419355] mean value: 0.8393145161290323 key: train_recall value: [0.92631579 0.91578947 0.91578947 0.91929825 0.91578947 0.92280702 0.91929825 0.91608392 0.91958042 0.92657343] mean value: 0.9197325481536007 key: test_roc_auc value: [0.796875 0.84375 0.75 0.84375 0.77620968 0.80846774 0.82459677 0.7797379 0.8109879 0.84022177] mean value: 0.8074596774193549 key: train_roc_auc value: [0.89473684 0.88596491 0.88596491 0.88947368 0.89845418 0.89322169 0.8914673 0.89137529 0.89312354 0.88784812] mean value: 0.8911630474788369 key: test_jcc value: [0.65789474 0.75 0.57894737 0.71428571 0.66666667 0.7 0.71794872 0.66666667 0.7 0.70588235] mean value: 0.68582922237721 key: train_jcc value: [0.81481481 0.8006135 0.8006135 0.80615385 0.81818182 0.8117284 0.80864198 0.80864198 0.8117284 0.80547112] mean value: 0.8086589338376311 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.29730058 0.30320644 0.42007184 0.30402303 0.30868077 0.32410741 0.30308056 0.30208421 0.33901119 0.33116007] mean value: 0.3232726097106934 key: score_time value: [0.01932836 0.01916742 0.01938939 0.01922774 0.01938105 0.02234888 0.01916742 0.01919007 0.01934958 0.01917791] mean value: 0.01957278251647949 key: test_mcc value: [0.5625 0.67253825 0.50395263 0.69293487 0.6385282 0.62325024 0.68352185 0.57596915 0.63159952 0.68740835] mean value: 0.6272203060525585 key: train_mcc value: [0.81295203 0.7971982 0.77330677 0.78033683 0.82587654 0.78775886 0.80483721 0.78376086 0.78740717 0.7781152 ] mean value: 0.7931549669716107 key: test_accuracy value: [0.78125 0.828125 0.75 0.84375 0.80952381 0.80952381 0.84126984 0.77777778 0.80952381 0.84126984] mean value: 0.8092013888888889 key: train_accuracy value: [0.90526316 0.89824561 0.88596491 0.88947368 0.91243433 0.89316988 0.90192644 0.89141856 0.89316988 0.88791594] mean value: 0.8958982394690755 key: test_fscore value: [0.78125 0.84507042 0.73333333 0.83333333 0.83333333 0.82352941 0.84848485 0.8 0.82352941 0.82758621] mean value: 0.8149450301446024 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_7030.py:156: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_7030.py:159: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.90878378 0.90034364 0.88926746 0.89267462 0.91438356 0.89608177 0.90410959 0.89419795 0.89608177 0.89225589] mean value: 0.8988180043360514 key: test_precision value: [0.78125 0.76923077 0.78571429 0.89285714 0.75 0.77777778 0.82352941 0.71794872 0.75675676 0.88888889] mean value: 0.7943953750939046 key: train_precision value: [0.8762215 0.88215488 0.86423841 0.86754967 0.89297659 0.87086093 0.88294314 0.87333333 0.87375415 0.86038961] mean value: 0.874442221613707 key: test_recall value: [0.78125 0.9375 0.6875 0.78125 0.9375 0.875 0.875 0.90322581 0.90322581 0.77419355] mean value: 0.8455645161290323 key: train_recall value: [0.94385965 0.91929825 0.91578947 0.91929825 0.93684211 0.92280702 0.92631579 0.91608392 0.91958042 0.92657343] mean value: 0.9246448288553551 key: test_roc_auc value: [0.78125 0.828125 0.75 0.84375 0.80745968 0.80846774 0.84072581 0.7797379 0.8109879 0.84022177] mean value: 0.8090725806451613 key: train_roc_auc value: [0.90526316 0.89824561 0.88596491 0.88947368 0.912477 0.89322169 0.90196908 0.89137529 0.89312354 0.88784812] mean value: 0.8958962090541038 key: test_jcc value: [0.64102564 0.73170732 0.57894737 0.71428571 0.71428571 0.7 0.73684211 0.66666667 0.7 0.70588235] mean value: 0.6889642879962294 key: train_jcc value: [0.83281734 0.81875 0.8006135 0.80615385 0.84227129 0.8117284 0.825 0.80864198 0.8117284 0.80547112] mean value: 0.8163175863975216 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.04284215 0.03778028 0.03733468 0.04465842 0.0438664 0.04482412 0.04302669 0.03571248 0.03619766 0.03719878] mean value: 0.04034416675567627 key: score_time value: [0.01585555 0.01863289 0.01600218 0.01202321 0.01203775 0.0120523 0.01193547 0.01202154 0.01468682 0.01500201] mean value: 0.014024972915649414 key: test_mcc value: [0.48566186 0.41736501 0.52223297 0.43519414 0.56521739 0.52623481 0.70164642 0.57396402 0.6092718 0.48566186] mean value: 0.5322450278028839 key: train_mcc value: [0.71565259 0.71638999 0.70571764 0.72484917 0.715385 0.69663647 0.72532357 0.68734715 0.7059816 0.70061391] mean value: 0.7093897093196634 key: test_accuracy value: [0.73913043 0.69565217 0.76086957 0.7173913 0.7826087 0.76086957 0.84782609 0.7826087 0.80434783 0.73913043] mean value: 0.7630434782608696 key: train_accuracy value: [0.85748792 0.85748792 0.852657 0.86231884 0.85748792 0.84782609 0.86231884 0.84299517 0.852657 0.85024155] mean value: 0.8543478260869566 key: test_fscore value: [0.76 0.74074074 0.75555556 0.71111111 0.7826087 0.74418605 0.85714286 0.8 0.8 0.71428571] mean value: 0.7665630720999781 key: train_fscore value: [0.86052009 0.8618267 0.85510689 0.86396181 0.85985748 0.85176471 0.86524823 0.84777518 0.85579196 0.85167464] mean value: 0.8573527688643722 key: test_precision value: [0.7037037 0.64516129 0.77272727 0.72727273 0.7826087 0.8 0.80769231 0.74074074 0.81818182 0.78947368] mean value: 0.7587562240503851 key: train_precision value: [0.84259259 0.83636364 0.8411215 0.85377358 0.84579439 0.83027523 0.84722222 0.82272727 0.83796296 0.8436019 ] mean value: 0.840143528471721 key: test_recall value: [0.82608696 0.86956522 0.73913043 0.69565217 0.7826087 0.69565217 0.91304348 0.86956522 0.7826087 0.65217391] mean value: 0.782608695652174 key: train_recall value: [0.87922705 0.88888889 0.86956522 0.87439614 0.87439614 0.87439614 0.88405797 0.87439614 0.87439614 0.85990338] mean value: 0.8753623188405797 key: test_roc_auc value: [0.73913043 0.69565217 0.76086957 0.7173913 0.7826087 0.76086957 0.84782609 0.7826087 0.80434783 0.73913043] mean value: 0.7630434782608695 key: train_roc_auc value: [0.85748792 0.85748792 0.852657 0.86231884 0.85748792 0.84782609 0.86231884 0.84299517 0.852657 0.85024155] mean value: 0.8543478260869565 key: test_jcc value: [0.61290323 0.58823529 0.60714286 0.55172414 0.64285714 0.59259259 0.75 0.66666667 0.66666667 0.55555556] mean value: 0.6234344139336615 key: train_jcc value: [0.75518672 0.75720165 0.74688797 0.7605042 0.75416667 0.74180328 0.7625 0.73577236 0.74793388 0.74166667] mean value: 0.7503623390610843 MCC on Blind test: 0.43 Accuracy on Blind test: 0.72 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [1.14505887 0.8046906 0.78343368 1.02523613 0.84160423 0.96089244 0.86398363 0.91612816 0.8715713 0.87083769] mean value: 0.9083436727523804 key: score_time value: [0.0148313 0.0152638 0.01667786 0.01521349 0.01547384 0.01542234 0.01566553 0.01552701 0.01535368 0.01540351] mean value: 0.015483236312866211 key: test_mcc value: [0.65465367 0.36514837 0.52223297 0.56521739 0.65465367 0.47826087 0.61394061 0.47245559 0.6092718 0.57396402] mean value: 0.5509798963838973 key: train_mcc value: [0.89376152 0.85025147 0.86538081 0.77792303 0.83116038 0.88422307 0.84556851 0.90822316 0.84556851 0.72977523] mean value: 0.8431835698524185 key: test_accuracy value: [0.82608696 0.67391304 0.76086957 0.7826087 0.82608696 0.73913043 0.80434783 0.7173913 0.80434783 0.7826087 ] mean value: 0.7717391304347826 key: train_accuracy value: [0.9468599 0.92512077 0.93236715 0.88888889 0.91545894 0.94202899 0.92270531 0.95410628 0.92270531 0.8647343 ] mean value: 0.9214975845410628 key: test_fscore value: [0.83333333 0.71698113 0.75555556 0.7826087 0.81818182 0.73913043 0.81632653 0.76363636 0.8 0.76190476] mean value: 0.7787658625734332 key: train_fscore value: [0.94711538 0.92493947 0.93364929 0.88995215 0.91646778 0.94258373 0.92344498 0.95399516 0.92344498 0.86666667] mean value: 0.9222259582829083 key: test_precision value: [0.8 0.63333333 0.77272727 0.7826087 0.85714286 0.73913043 0.76923077 0.65625 0.81818182 0.84210526] mean value: 0.7670710444208728 key: train_precision value: [0.94258373 0.92718447 0.91627907 0.88151659 0.90566038 0.93364929 0.91469194 0.95631068 0.91469194 0.85446009] mean value: 0.9147028181744306 key: test_recall value: [0.86956522 0.82608696 0.73913043 0.7826087 0.7826087 0.73913043 0.86956522 0.91304348 0.7826087 0.69565217] mean value: 0.8 key: train_recall value: [0.95169082 0.92270531 0.95169082 0.89855072 0.92753623 0.95169082 0.93236715 0.95169082 0.93236715 0.87922705] mean value: 0.9299516908212561 key: test_roc_auc value: [0.82608696 0.67391304 0.76086957 0.7826087 0.82608696 0.73913043 0.80434783 0.7173913 0.80434783 0.7826087 ] mean value: 0.7717391304347826 key: train_roc_auc value: [0.9468599 0.92512077 0.93236715 0.88888889 0.91545894 0.94202899 0.92270531 0.95410628 0.92270531 0.8647343 ] mean value: 0.9214975845410629 key: test_jcc value: [0.71428571 0.55882353 0.60714286 0.64285714 0.69230769 0.5862069 0.68965517 0.61764706 0.66666667 0.61538462] mean value: 0.63909773458455 key: train_jcc value: [0.89954338 0.86036036 0.87555556 0.80172414 0.84581498 0.89140271 0.85777778 0.91203704 0.85777778 0.76470588] mean value: 0.8566699600693612 MCC on Blind test: 0.44 Accuracy on Blind test: 0.72 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01417565 0.01006126 0.00992203 0.00977826 0.00981617 0.00975299 0.00976253 0.00992513 0.00973701 0.00970101] mean value: 0.010263204574584961 key: score_time value: [0.01206207 0.00936532 0.00897574 0.00889325 0.00894952 0.00891542 0.00892544 0.00896716 0.00890231 0.00891733] mean value: 0.009287357330322266 key: test_mcc value: [0.35082321 0.30550505 0.30905755 0.52223297 0.36514837 0.48007936 0.52223297 0.4454354 0.45643546 0.39735971] mean value: 0.4154310044853051 key: train_mcc value: [0.44211758 0.42423178 0.44989455 0.4784619 0.44369133 0.48007936 0.4517191 0.442711 0.4517191 0.45745478] mean value: 0.45220804789426783 key: test_accuracy value: [0.67391304 0.65217391 0.65217391 0.76086957 0.67391304 0.73913043 0.76086957 0.7173913 0.7173913 0.69565217] mean value: 0.7043478260869566 key: train_accuracy value: [0.71980676 0.70531401 0.71980676 0.73913043 0.71980676 0.73913043 0.72222222 0.7173913 0.72222222 0.72705314] mean value: 0.7231884057971014 key: test_fscore value: [0.65116279 0.63636364 0.61904762 0.76595745 0.61538462 0.75 0.75555556 0.68292683 0.66666667 0.66666667] mean value: 0.6809731826459238 key: train_fscore value: [0.70408163 0.66298343 0.68648649 0.74285714 0.69948187 0.75 0.69496021 0.688 0.69496021 0.70951157] mean value: 0.7033322545222606 key: test_precision value: [0.7 0.66666667 0.68421053 0.75 0.75 0.72 0.77272727 0.77777778 0.8125 0.73684211] mean value: 0.7370724348750665 key: train_precision value: [0.74594595 0.77419355 0.7791411 0.73239437 0.75418994 0.72 0.77058824 0.76785714 0.77058824 0.75824176] mean value: 0.7573140280645919 key: test_recall value: [0.60869565 0.60869565 0.56521739 0.7826087 0.52173913 0.7826087 0.73913043 0.60869565 0.56521739 0.60869565] mean value: 0.6391304347826087 key: train_recall value: [0.66666667 0.57971014 0.61352657 0.75362319 0.65217391 0.7826087 0.63285024 0.62318841 0.63285024 0.66666667] mean value: 0.6603864734299517 key: test_roc_auc value: [0.67391304 0.65217391 0.65217391 0.76086957 0.67391304 0.73913043 0.76086957 0.7173913 0.7173913 0.69565217] mean value: 0.7043478260869565 key: train_roc_auc value: [0.71980676 0.70531401 0.71980676 0.73913043 0.71980676 0.73913043 0.72222222 0.7173913 0.72222222 0.72705314] mean value: 0.7231884057971014 key: test_jcc value: [0.48275862 0.46666667 0.44827586 0.62068966 0.44444444 0.6 0.60714286 0.51851852 0.5 0.5 ] mean value: 0.5188496624703521 key: train_jcc value: [0.54330709 0.49586777 0.52263374 0.59090909 0.53784861 0.6 0.53252033 0.52439024 0.53252033 0.5498008 ] mean value: 0.5429797987673654 MCC on Blind test: 0.35 Accuracy on Blind test: 0.69 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01003051 0.01000547 0.01006913 0.00997734 0.01001358 0.01002645 0.01001191 0.00994897 0.00999188 0.00996375] mean value: 0.010003900527954102 key: score_time value: [0.00892138 0.00891447 0.00889754 0.00898504 0.00888562 0.008883 0.00893068 0.00893736 0.00896692 0.00892067] mean value: 0.008924269676208496 key: test_mcc value: [0.26311741 0.26111648 0.30434783 0.30905755 0.34815531 0.4454354 0.48566186 0.41736501 0.47826087 0.43519414] mean value: 0.37477118607890214 key: train_mcc value: [0.4977937 0.54289671 0.52179393 0.51693234 0.51729468 0.49582377 0.4882875 0.47503462 0.49992343 0.47370088] mean value: 0.5029481566309334 key: test_accuracy value: [0.63043478 0.63043478 0.65217391 0.65217391 0.67391304 0.7173913 0.73913043 0.69565217 0.73913043 0.7173913 ] mean value: 0.6847826086956521 key: train_accuracy value: [0.74879227 0.7705314 0.76086957 0.75845411 0.75845411 0.74637681 0.74396135 0.73671498 0.74879227 0.73671498] mean value: 0.7509661835748792 key: test_fscore value: [0.65306122 0.63829787 0.65217391 0.61904762 0.66666667 0.74509804 0.76 0.74074074 0.73913043 0.71111111] mean value: 0.6925327621438132 key: train_fscore value: [0.75238095 0.77958237 0.76258993 0.75728155 0.76303318 0.7597254 0.74881517 0.74709977 0.76036866 0.73218673] mean value: 0.7563063705878426 key: test_precision value: [0.61538462 0.625 0.65217391 0.68421053 0.68181818 0.67857143 0.7037037 0.64516129 0.73913043 0.72727273] mean value: 0.6752426821215114 key: train_precision value: [0.74178404 0.75 0.75714286 0.76097561 0.74883721 0.72173913 0.73488372 0.71875 0.72687225 0.745 ] mean value: 0.7405984811821016 key: test_recall value: [0.69565217 0.65217391 0.65217391 0.56521739 0.65217391 0.82608696 0.82608696 0.86956522 0.73913043 0.69565217] mean value: 0.717391304347826 key: train_recall value: [0.76328502 0.8115942 0.76811594 0.75362319 0.77777778 0.80193237 0.76328502 0.77777778 0.79710145 0.71980676] mean value: 0.7734299516908213 key: test_roc_auc value: [0.63043478 0.63043478 0.65217391 0.65217391 0.67391304 0.7173913 0.73913043 0.69565217 0.73913043 0.7173913 ] mean value: 0.6847826086956521 key: train_roc_auc value: [0.74879227 0.7705314 0.76086957 0.75845411 0.75845411 0.74637681 0.74396135 0.73671498 0.74879227 0.73671498] mean value: 0.7509661835748792 key: test_jcc value: [0.48484848 0.46875 0.48387097 0.44827586 0.5 0.59375 0.61290323 0.58823529 0.5862069 0.55172414] mean value: 0.5318564869066243 key: train_jcc value: [0.60305344 0.63878327 0.61627907 0.609375 0.61685824 0.61254613 0.59848485 0.5962963 0.6133829 0.57751938] mean value: 0.6082578562107429 MCC on Blind test: 0.35 Accuracy on Blind test: 0.68 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00986171 0.00919127 0.01064014 0.00955677 0.01044106 0.01053977 0.01064229 0.01063395 0.01053476 0.01057172] mean value: 0.010261344909667968 key: score_time value: [0.01187658 0.01271725 0.01210403 0.01190686 0.01244688 0.01247811 0.0124712 0.01251221 0.01339221 0.01789713] mean value: 0.012980246543884277 key: test_mcc value: [0.21821789 0. 0.22075539 0.34815531 0.22518867 0.34815531 0.1351132 0.04415108 0.08908708 0.35082321] mean value: 0.1979647153120915 key: train_mcc value: [0.58588257 0.59520387 0.53645729 0.52777851 0.52444489 0.51751244 0.55578908 0.57512087 0.57085489 0.52857222] mean value: 0.5517616648647189 key: test_accuracy value: [0.60869565 0.5 0.60869565 0.67391304 0.60869565 0.67391304 0.56521739 0.52173913 0.54347826 0.67391304] mean value: 0.5978260869565217 key: train_accuracy value: [0.79227053 0.79710145 0.76811594 0.76328502 0.76086957 0.75845411 0.77777778 0.78743961 0.78502415 0.76328502] mean value: 0.7753623188405797 key: test_fscore value: [0.625 0.54901961 0.57142857 0.66666667 0.65384615 0.66666667 0.61538462 0.56 0.48780488 0.69387755] mean value: 0.6089694710905 key: train_fscore value: [0.79906542 0.8028169 0.77142857 0.77102804 0.77241379 0.76415094 0.78095238 0.79047619 0.79058824 0.77314815] mean value: 0.7816068622151459 key: test_precision value: [0.6 0.5 0.63157895 0.68181818 0.5862069 0.68181818 0.55172414 0.51851852 0.55555556 0.65384615] mean value: 0.5961066573407771 key: train_precision value: [0.77375566 0.78082192 0.76056338 0.74660633 0.73684211 0.74654378 0.76995305 0.77934272 0.7706422 0.74222222] mean value: 0.7607293371810109 key: test_recall value: [0.65217391 0.60869565 0.52173913 0.65217391 0.73913043 0.65217391 0.69565217 0.60869565 0.43478261 0.73913043] mean value: 0.6304347826086957 key: train_recall value: [0.82608696 0.82608696 0.7826087 0.79710145 0.8115942 0.7826087 0.79227053 0.80193237 0.8115942 0.80676329] mean value: 0.8038647342995169 key: test_roc_auc value: [0.60869565 0.5 0.60869565 0.67391304 0.60869565 0.67391304 0.56521739 0.52173913 0.54347826 0.67391304] mean value: 0.5978260869565217 key: train_roc_auc value: [0.79227053 0.79710145 0.76811594 0.76328502 0.76086957 0.75845411 0.77777778 0.78743961 0.78502415 0.76328502] mean value: 0.7753623188405797 key: test_jcc value: [0.45454545 0.37837838 0.4 0.5 0.48571429 0.5 0.44444444 0.38888889 0.32258065 0.53125 ] mean value: 0.4405802097132742 key: train_jcc value: [0.66536965 0.67058824 0.62790698 0.62737643 0.62921348 0.61832061 0.640625 0.65354331 0.6536965 0.63018868] mean value: 0.6416828865918727 MCC on Blind test: 0.27 Accuracy on Blind test: 0.64 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02020502 0.01990628 0.02276731 0.02351856 0.02368045 0.02389264 0.02385902 0.02400041 0.02121592 0.02006316] mean value: 0.022310876846313478 key: score_time value: [0.01165295 0.01172495 0.01302648 0.0129962 0.0127008 0.0130198 0.01305652 0.01332688 0.01172519 0.01192451] mean value: 0.012515425682067871 key: test_mcc value: [0.45643546 0.40533961 0.39130435 0.43519414 0.43519414 0.56736651 0.49541508 0.69560834 0.52623481 0.52623481] mean value: 0.4934327271836572 key: train_mcc value: [0.66295066 0.72998103 0.6576267 0.71739923 0.67465395 0.70830403 0.67772632 0.68747778 0.67427068 0.70607487] mean value: 0.6896465241231877 key: test_accuracy value: [0.7173913 0.69565217 0.69565217 0.7173913 0.7173913 0.7826087 0.73913043 0.82608696 0.76086957 0.76086957] mean value: 0.741304347826087 key: train_accuracy value: [0.83091787 0.86231884 0.82850242 0.85748792 0.83574879 0.852657 0.83574879 0.84057971 0.83333333 0.85024155] mean value: 0.8427536231884059 key: test_fscore value: [0.75471698 0.73076923 0.69565217 0.71111111 0.72340426 0.79166667 0.76923077 0.85185185 0.74418605 0.74418605] mean value: 0.7516775133017153 key: train_fscore value: [0.83568075 0.87015945 0.8321513 0.86310905 0.84331797 0.8591224 0.84615385 0.85067873 0.84494382 0.85909091] mean value: 0.8504408236135929 key: test_precision value: [0.66666667 0.65517241 0.69565217 0.72727273 0.70833333 0.76 0.68965517 0.74193548 0.8 0.8 ] mean value: 0.7244687971263635 key: train_precision value: [0.81278539 0.82327586 0.81481481 0.83035714 0.8061674 0.82300885 0.79574468 0.8 0.78991597 0.8111588 ] mean value: 0.8107228903828236 key: test_recall value: [0.86956522 0.82608696 0.69565217 0.69565217 0.73913043 0.82608696 0.86956522 1. 0.69565217 0.69565217] mean value: 0.7913043478260869 key: train_recall value: [0.85990338 0.92270531 0.85024155 0.89855072 0.88405797 0.89855072 0.90338164 0.90821256 0.90821256 0.91304348] mean value: 0.8946859903381643 key: test_roc_auc value: [0.7173913 0.69565217 0.69565217 0.7173913 0.7173913 0.7826087 0.73913043 0.82608696 0.76086957 0.76086957] mean value: 0.741304347826087 key: train_roc_auc value: [0.83091787 0.86231884 0.82850242 0.85748792 0.83574879 0.852657 0.83574879 0.84057971 0.83333333 0.85024155] mean value: 0.8427536231884059 key: test_jcc value: [0.60606061 0.57575758 0.53333333 0.55172414 0.56666667 0.65517241 0.625 0.74193548 0.59259259 0.59259259] mean value: 0.6040835402598472 key: train_jcc value: [0.71774194 0.77016129 0.71255061 0.75918367 0.72908367 0.75303644 0.73333333 0.74015748 0.73151751 0.75298805] mean value: 0.7399753980333583 MCC on Blind test: 0.43 Accuracy on Blind test: 0.71 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.48387837 1.04218054 0.88318419 0.63158584 1.02904654 1.14515352 1.52221656 0.689291 1.10090923 1.16517997] mean value: 1.0692625761032104 key: score_time value: [0.01243424 0.01250696 0.01239061 0.01240277 0.01242542 0.01239514 0.01235199 0.01239777 0.01241636 0.01621032] mean value: 0.012793159484863282 key: test_mcc value: [0.4454354 0.47245559 0.52623481 0.48007936 0.47826087 0.43519414 0.57396402 0.4454354 0.53452248 0.6092718 ] mean value: 0.5000853878258789 key: train_mcc value: [0.83310808 0.72761502 0.74082672 0.64112383 0.79272392 0.77447567 0.86965986 0.67772632 0.74562472 0.80533198] mean value: 0.7608216120675093 key: test_accuracy value: [0.7173913 0.7173913 0.76086957 0.73913043 0.73913043 0.7173913 0.7826087 0.7173913 0.76086957 0.80434783] mean value: 0.7456521739130435 key: train_accuracy value: [0.91545894 0.85024155 0.86956522 0.80434783 0.89613527 0.88647343 0.93236715 0.83574879 0.85990338 0.90096618] mean value: 0.87512077294686 key: test_fscore value: [0.74509804 0.76363636 0.74418605 0.75 0.73913043 0.71111111 0.8 0.74509804 0.78431373 0.8 ] mean value: 0.7582573759963279 key: train_fscore value: [0.91841492 0.86808511 0.865 0.8308977 0.89786223 0.88992974 0.93577982 0.84615385 0.87606838 0.90531178] mean value: 0.883350352054179 key: test_precision value: [0.67857143 0.65625 0.8 0.72 0.73913043 0.72727273 0.74074074 0.67857143 0.71428571 0.81818182] mean value: 0.7273004292406466 key: train_precision value: [0.88738739 0.7756654 0.89637306 0.73161765 0.88317757 0.86363636 0.89082969 0.79574468 0.78544061 0.86725664] mean value: 0.8377129049779565 key: test_recall value: [0.82608696 0.91304348 0.69565217 0.7826087 0.73913043 0.69565217 0.86956522 0.82608696 0.86956522 0.7826087 ] mean value: 0.8 key: train_recall value: [0.95169082 0.98550725 0.83574879 0.96135266 0.91304348 0.9178744 0.98550725 0.90338164 0.99033816 0.9468599 ] mean value: 0.9391304347826087 key: test_roc_auc value: [0.7173913 0.7173913 0.76086957 0.73913043 0.73913043 0.7173913 0.7826087 0.7173913 0.76086957 0.80434783] mean value: 0.7456521739130435 key: train_roc_auc value: [0.91545894 0.85024155 0.86956522 0.80434783 0.89613527 0.88647343 0.93236715 0.83574879 0.85990338 0.90096618] mean value: 0.87512077294686 key: test_jcc value: [0.59375 0.61764706 0.59259259 0.6 0.5862069 0.55172414 0.66666667 0.59375 0.64516129 0.66666667] mean value: 0.6114165309554794 key: train_jcc value: [0.84913793 0.76691729 0.76211454 0.71071429 0.81465517 0.80168776 0.87931034 0.73333333 0.77946768 0.82700422] mean value: 0.7924342561732226 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02884746 0.02460098 0.02956152 0.0261054 0.02656984 0.02446222 0.02462029 0.02391934 0.02574706 0.02467108] mean value: 0.02591052055358887 key: score_time value: [0.01191378 0.00934911 0.00957465 0.00892711 0.00892615 0.00891709 0.00895023 0.00892878 0.0089879 0.00895309] mean value: 0.009342789649963379 key: test_mcc value: [0.74194083 0.62360956 0.6092718 0.78935222 0.82608696 0.78935222 0.75056834 0.65465367 0.69631062 0.91304348] mean value: 0.7394189686845861 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 0.80434783 0.80434783 0.89130435 0.91304348 0.89130435 0.86956522 0.82608696 0.84782609 0.95652174] mean value: 0.8673913043478261 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.875 0.82352941 0.80851064 0.88372093 0.91304348 0.89795918 0.88 0.83333333 0.85106383 0.95652174] mean value: 0.8722682544480478 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.84 0.75 0.79166667 0.95 0.91304348 0.84615385 0.81481481 0.8 0.83333333 0.95652174] mean value: 0.8495533878359965 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.91304348 0.91304348 0.82608696 0.82608696 0.91304348 0.95652174 0.95652174 0.86956522 0.86956522 0.95652174] mean value: 0.9 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86956522 0.80434783 0.80434783 0.89130435 0.91304348 0.89130435 0.86956522 0.82608696 0.84782609 0.95652174] mean value: 0.8673913043478261 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.77777778 0.7 0.67857143 0.79166667 0.84 0.81481481 0.78571429 0.71428571 0.74074074 0.91666667] mean value: 0.7760238095238094 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.56 Accuracy on Blind test: 0.78 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.12353754 0.12308574 0.12323904 0.12355566 0.12350321 0.12364197 0.12468982 0.1238718 0.12950373 0.12978148] mean value: 0.12484099864959716 key: score_time value: [0.01787996 0.01805329 0.01808262 0.0181849 0.01806045 0.01811862 0.0181942 0.0181005 0.01856756 0.01836109] mean value: 0.018160319328308104 key: test_mcc value: [0.56736651 0.24140227 0.43852901 0.61394061 0.52223297 0.48007936 0.52623481 0.4454354 0.48007936 0.48007936] mean value: 0.47953796707708835 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.7826087 0.60869565 0.7173913 0.80434783 0.76086957 0.73913043 0.76086957 0.7173913 0.73913043 0.73913043] mean value: 0.7369565217391304 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.79166667 0.67857143 0.69767442 0.79069767 0.76595745 0.72727273 0.74418605 0.74509804 0.72727273 0.72727273] mean value: 0.7395669902615358 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.76 0.57575758 0.75 0.85 0.75 0.76190476 0.8 0.67857143 0.76190476 0.76190476] mean value: 0.745004329004329 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.82608696 0.82608696 0.65217391 0.73913043 0.7826087 0.69565217 0.69565217 0.82608696 0.69565217 0.69565217] mean value: 0.7434782608695653 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.7826087 0.60869565 0.7173913 0.80434783 0.76086957 0.73913043 0.76086957 0.7173913 0.73913043 0.73913043] mean value: 0.7369565217391304 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.65517241 0.51351351 0.53571429 0.65384615 0.62068966 0.57142857 0.59259259 0.59375 0.57142857 0.57142857] mean value: 0.5879564328917777 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.46 Accuracy on Blind test: 0.74 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01158834 0.01016188 0.01072598 0.01104307 0.01108384 0.01038527 0.01136303 0.01132059 0.01050973 0.01084805] mean value: 0.010902976989746094 key: score_time value: [0.00946069 0.00910878 0.00992489 0.00964308 0.00900888 0.00962877 0.00933528 0.0089159 0.00932121 0.0091238 ] mean value: 0.009347128868103027 key: test_mcc value: [ 0.35082321 -0.04347826 0.26726124 0.17407766 0.39735971 0.26311741 0.49541508 0.13043478 0.22518867 0.30434783] mean value: 0.2564547324906315 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.67391304 0.47826087 0.63043478 0.58695652 0.69565217 0.63043478 0.73913043 0.56521739 0.60869565 0.65217391] mean value: 0.6260869565217392 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.65116279 0.47826087 0.66666667 0.57777778 0.66666667 0.65306122 0.76923077 0.56521739 0.55 0.65217391] mean value: 0.6230218069442394 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.7 0.47826087 0.60714286 0.59090909 0.73684211 0.61538462 0.68965517 0.56521739 0.64705882 0.65217391] mean value: 0.628264483855597 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.60869565 0.47826087 0.73913043 0.56521739 0.60869565 0.69565217 0.86956522 0.56521739 0.47826087 0.65217391] mean value: 0.6260869565217391 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.67391304 0.47826087 0.63043478 0.58695652 0.69565217 0.63043478 0.73913043 0.56521739 0.60869565 0.65217391] mean value: 0.6260869565217391 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.48275862 0.31428571 0.5 0.40625 0.5 0.48484848 0.625 0.39393939 0.37931034 0.48387097] mean value: 0.457026352633277 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.23 Accuracy on Blind test: 0.62 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.75904489 1.80482984 1.76845455 1.77710342 1.78683972 1.7867794 1.81268549 1.771945 1.7612524 1.76990128] mean value: 1.7798835992813111 key: score_time value: [0.09351516 0.10056996 0.09473276 0.09771657 0.09969068 0.10154796 0.10460901 0.09457302 0.09260321 0.15258074] mean value: 0.10321390628814697 key: test_mcc value: [0.91651514 0.56694671 0.56521739 0.78935222 0.78334945 0.74194083 0.74194083 0.52223297 0.75056834 0.78334945] mean value: 0.7161413317928996 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95652174 0.76086957 0.7826087 0.89130435 0.89130435 0.86956522 0.86956522 0.76086957 0.86956522 0.89130435] mean value: 0.8543478260869565 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95833333 0.8 0.7826087 0.88372093 0.89361702 0.875 0.875 0.76595745 0.85714286 0.88888889] mean value: 0.8580269173334918 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.92 0.6875 0.7826087 0.95 0.875 0.84 0.84 0.75 0.94736842 0.90909091] mean value: 0.8501568025795715 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.95652174 0.7826087 0.82608696 0.91304348 0.91304348 0.91304348 0.7826087 0.7826087 0.86956522] mean value: 0.8739130434782608 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95652174 0.76086957 0.7826087 0.89130435 0.89130435 0.86956522 0.86956522 0.76086957 0.86956522 0.89130435] mean value: 0.8543478260869566 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.92 0.66666667 0.64285714 0.79166667 0.80769231 0.77777778 0.77777778 0.62068966 0.75 0.8 ] mean value: 0.7555127994610753 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.63 Accuracy on Blind test: 0.81 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.9309938 0.97561812 0.97212815 0.9395082 0.96008706 0.96088171 0.955127 0.95128441 0.90736485 0.98695827] mean value: 0.953995156288147 key: score_time value: [0.22832513 0.25179029 0.13333225 0.21760082 0.20836687 0.2838335 0.25420737 0.27094269 0.22470903 0.20994496] mean value: 0.2283052921295166 key: test_mcc value: [0.87705802 0.54772256 0.65217391 0.82922798 0.73913043 0.74194083 0.78334945 0.61394061 0.78935222 0.78334945] mean value: 0.7357245468829955 key: train_mcc value: [0.93275907 0.93306423 0.94273329 0.93773439 0.92780693 0.93773439 0.92841417 0.94242496 0.91372611 0.92806703] mean value: 0.9324464576360868 key: test_accuracy value: [0.93478261 0.76086957 0.82608696 0.91304348 0.86956522 0.86956522 0.89130435 0.80434783 0.89130435 0.89130435] mean value: 0.8652173913043478 key: train_accuracy value: [0.96618357 0.96618357 0.97101449 0.96859903 0.96376812 0.96859903 0.96376812 0.97101449 0.95652174 0.96376812] mean value: 0.9659420289855072 key: test_fscore value: [0.93877551 0.79245283 0.82608696 0.90909091 0.86956522 0.875 0.89361702 0.81632653 0.88372093 0.88888889] mean value: 0.8693524794407002 key: train_fscore value: [0.96666667 0.96682464 0.97156398 0.96912114 0.96420048 0.96912114 0.96453901 0.97142857 0.95734597 0.96437055] mean value: 0.9665182146274129 key: test_precision value: [0.88461538 0.7 0.82608696 0.95238095 0.86956522 0.84 0.875 0.76923077 0.95 0.90909091] mean value: 0.8575970189231059 key: train_precision value: [0.95305164 0.94883721 0.95348837 0.95327103 0.95283019 0.95327103 0.94444444 0.95774648 0.93953488 0.94859813] mean value: 0.9505073407221585 key: test_recall value: [1. 0.91304348 0.82608696 0.86956522 0.86956522 0.91304348 0.91304348 0.86956522 0.82608696 0.86956522] mean value: 0.8869565217391304 key: train_recall value: [0.98067633 0.98550725 0.99033816 0.98550725 0.97584541 0.98550725 0.98550725 0.98550725 0.97584541 0.98067633] mean value: 0.9830917874396136 key: test_roc_auc value: [0.93478261 0.76086957 0.82608696 0.91304348 0.86956522 0.86956522 0.89130435 0.80434783 0.89130435 0.89130435] mean value: 0.8652173913043478 key: train_roc_auc value: [0.96618357 0.96618357 0.97101449 0.96859903 0.96376812 0.96859903 0.96376812 0.97101449 0.95652174 0.96376812] mean value: 0.9659420289855072 key: test_jcc value: [0.88461538 0.65625 0.7037037 0.83333333 0.76923077 0.77777778 0.80769231 0.68965517 0.79166667 0.8 ] mean value: 0.7713925115433736 key: train_jcc value: [0.93548387 0.93577982 0.94470046 0.94009217 0.93087558 0.94009217 0.93150685 0.94444444 0.91818182 0.93119266] mean value: 0.9352349828636888 MCC on Blind test: 0.65 Accuracy on Blind test: 0.83 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02340174 0.00980616 0.00984836 0.00980854 0.01029038 0.01004672 0.0101068 0.01000142 0.00982833 0.00993443] mean value: 0.011307287216186523 key: score_time value: [0.01417089 0.00880098 0.00897098 0.00886726 0.00935221 0.00891757 0.00887465 0.0089643 0.00900984 0.008883 ] mean value: 0.009481167793273926 key: test_mcc value: [0.26311741 0.26111648 0.30434783 0.30905755 0.34815531 0.4454354 0.48566186 0.41736501 0.47826087 0.43519414] mean value: 0.37477118607890214 key: train_mcc value: [0.4977937 0.54289671 0.52179393 0.51693234 0.51729468 0.49582377 0.4882875 0.47503462 0.49992343 0.47370088] mean value: 0.5029481566309334 key: test_accuracy value: [0.63043478 0.63043478 0.65217391 0.65217391 0.67391304 0.7173913 0.73913043 0.69565217 0.73913043 0.7173913 ] mean value: 0.6847826086956521 key: train_accuracy value: [0.74879227 0.7705314 0.76086957 0.75845411 0.75845411 0.74637681 0.74396135 0.73671498 0.74879227 0.73671498] mean value: 0.7509661835748792 key: test_fscore value: [0.65306122 0.63829787 0.65217391 0.61904762 0.66666667 0.74509804 0.76 0.74074074 0.73913043 0.71111111] mean value: 0.6925327621438132 key: train_fscore value: [0.75238095 0.77958237 0.76258993 0.75728155 0.76303318 0.7597254 0.74881517 0.74709977 0.76036866 0.73218673] mean value: 0.7563063705878426 key: test_precision value: [0.61538462 0.625 0.65217391 0.68421053 0.68181818 0.67857143 0.7037037 0.64516129 0.73913043 0.72727273] mean value: 0.6752426821215114 key: train_precision value: [0.74178404 0.75 0.75714286 0.76097561 0.74883721 0.72173913 0.73488372 0.71875 0.72687225 0.745 ] mean value: 0.7405984811821016 key: test_recall value: [0.69565217 0.65217391 0.65217391 0.56521739 0.65217391 0.82608696 0.82608696 0.86956522 0.73913043 0.69565217] mean value: 0.717391304347826 key: train_recall value: [0.76328502 0.8115942 0.76811594 0.75362319 0.77777778 0.80193237 0.76328502 0.77777778 0.79710145 0.71980676] mean value: 0.7734299516908213 key: test_roc_auc value: [0.63043478 0.63043478 0.65217391 0.65217391 0.67391304 0.7173913 0.73913043 0.69565217 0.73913043 0.7173913 ] mean value: 0.6847826086956521 key: train_roc_auc value: [0.74879227 0.7705314 0.76086957 0.75845411 0.75845411 0.74637681 0.74396135 0.73671498 0.74879227 0.73671498] mean value: 0.7509661835748792 key: test_jcc value: [0.48484848 0.46875 0.48387097 0.44827586 0.5 0.59375 0.61290323 0.58823529 0.5862069 0.55172414] mean value: 0.5318564869066243 key: train_jcc value: [0.60305344 0.63878327 0.61627907 0.609375 0.61685824 0.61254613 0.59848485 0.5962963 0.6133829 0.57751938] mean value: 0.6082578562107429 MCC on Blind test: 0.35 Accuracy on Blind test: 0.68 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.08936453 0.07846737 0.07669997 0.08492184 0.08044219 0.08309031 0.10936308 0.0800128 0.08465981 0.22779965] mean value: 0.09948215484619141 key: score_time value: [0.01102686 0.01109076 0.01118302 0.01103806 0.01191735 0.01103616 0.01134872 0.01110458 0.01118922 0.01178503] mean value: 0.011271977424621582 key: test_mcc value: [0.87038828 0.74194083 0.65465367 0.73913043 0.87038828 0.82922798 0.87705802 0.61394061 0.74194083 1. ] mean value: 0.7938668934371033 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93478261 0.86956522 0.82608696 0.86956522 0.93478261 0.91304348 0.93478261 0.80434783 0.86956522 1. ] mean value: 0.8956521739130435 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93617021 0.875 0.83333333 0.86956522 0.93333333 0.91666667 0.93877551 0.81632653 0.86363636 1. ] mean value: 0.8982807167943285 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.91666667 0.84 0.8 0.86956522 0.95454545 0.88 0.88461538 0.76923077 0.9047619 1. ] mean value: 0.8819385397211484 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.95652174 0.91304348 0.86956522 0.86956522 0.91304348 0.95652174 1. 0.86956522 0.82608696 1. ] mean value: 0.9173913043478261 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93478261 0.86956522 0.82608696 0.86956522 0.93478261 0.91304348 0.93478261 0.80434783 0.86956522 1. ] mean value: 0.8956521739130434 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88 0.77777778 0.71428571 0.76923077 0.875 0.84615385 0.88461538 0.68965517 0.76 1. ] mean value: 0.8196718664477285 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.68 Accuracy on Blind test: 0.84 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04487348 0.08185077 0.05841994 0.07512474 0.07208538 0.08272529 0.06066704 0.03999424 0.04051352 0.0768981 ] mean value: 0.06331524848937989 key: score_time value: [0.02183747 0.02446556 0.02108097 0.01225448 0.02332473 0.02089667 0.01228571 0.01221514 0.01829672 0.02186108] mean value: 0.018851852416992186 key: test_mcc value: [0.65465367 0.37796447 0.43519414 0.61394061 0.48007936 0.48007936 0.56521739 0.57396402 0.52223297 0.69631062] mean value: 0.5399636618548821 key: train_mcc value: [0.79844422 0.83610009 0.85048969 0.79302043 0.84082503 0.86039548 0.83116038 0.82149572 0.85507246 0.81737947] mean value: 0.8304382976209043 key: test_accuracy value: [0.82608696 0.67391304 0.7173913 0.80434783 0.73913043 0.73913043 0.7826087 0.7826087 0.76086957 0.84782609] mean value: 0.7673913043478261 key: train_accuracy value: [0.89855072 0.9178744 0.92512077 0.89613527 0.92028986 0.92995169 0.91545894 0.91062802 0.92753623 0.90821256] mean value: 0.914975845410628 key: test_fscore value: [0.83333333 0.72727273 0.71111111 0.81632653 0.75 0.75 0.7826087 0.8 0.75555556 0.85106383] mean value: 0.777727178332438 key: train_fscore value: [0.90140845 0.91904762 0.92601432 0.89834515 0.92124105 0.93111639 0.91646778 0.91169451 0.92753623 0.91037736] mean value: 0.9163248864437317 key: test_precision value: [0.8 0.625 0.72727273 0.76923077 0.72 0.72 0.7826087 0.74074074 0.77272727 0.83333333] mean value: 0.7490913538957017 key: train_precision value: [0.87671233 0.90610329 0.91509434 0.87962963 0.91037736 0.91588785 0.90566038 0.9009434 0.92753623 0.88940092] mean value: 0.9027345720490176 key: test_recall value: [0.86956522 0.86956522 0.69565217 0.86956522 0.7826087 0.7826087 0.7826087 0.86956522 0.73913043 0.86956522] mean value: 0.8130434782608695 key: train_recall value: [0.92753623 0.93236715 0.93719807 0.9178744 0.93236715 0.9468599 0.92753623 0.92270531 0.92753623 0.93236715] mean value: 0.9304347826086956 key: test_roc_auc value: [0.82608696 0.67391304 0.7173913 0.80434783 0.73913043 0.73913043 0.7826087 0.7826087 0.76086957 0.84782609] mean value: 0.7673913043478261 key: train_roc_auc value: [0.89855072 0.9178744 0.92512077 0.89613527 0.92028986 0.92995169 0.91545894 0.91062802 0.92753623 0.90821256] mean value: 0.914975845410628 key: test_jcc value: [0.71428571 0.57142857 0.55172414 0.68965517 0.6 0.6 0.64285714 0.66666667 0.60714286 0.74074074] mean value: 0.6384501003466521 key: train_jcc value: [0.82051282 0.85022026 0.86222222 0.81545064 0.8539823 0.87111111 0.84581498 0.8377193 0.86486486 0.83549784] mean value: 0.8457396339406997 MCC on Blind test: 0.4 Accuracy on Blind test: 0.7 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.03631878 0.0097506 0.00945354 0.00935531 0.00948024 0.00942874 0.0097692 0.00947857 0.00951481 0.00936866] mean value: 0.012191843986511231 key: score_time value: [0.00945902 0.00890803 0.00869727 0.00870514 0.00875044 0.00870657 0.00897241 0.008641 0.00870872 0.00867629] mean value: 0.008822488784790038 key: test_mcc value: [0.35082321 0.43852901 0.43519414 0.52623481 0.43852901 0.52623481 0.71269665 0.57396402 0.48566186 0.39130435] mean value: 0.4879171868665441 key: train_mcc value: [0.52777851 0.54796937 0.53784095 0.53320213 0.54335651 0.54796937 0.53215286 0.52041792 0.52498987 0.50810087] mean value: 0.5323778367643669 key: test_accuracy value: [0.67391304 0.7173913 0.7173913 0.76086957 0.7173913 0.76086957 0.84782609 0.7826087 0.73913043 0.69565217] mean value: 0.741304347826087 key: train_accuracy value: [0.76328502 0.77294686 0.76811594 0.76570048 0.7705314 0.77294686 0.76570048 0.75845411 0.76086957 0.75362319] mean value: 0.7652173913043478 key: test_fscore value: [0.69387755 0.73469388 0.72340426 0.74418605 0.73469388 0.7755102 0.8627451 0.8 0.71428571 0.69565217] mean value: 0.7479048798272832 key: train_fscore value: [0.77102804 0.78240741 0.77674419 0.774942 0.78060046 0.78240741 0.77176471 0.7716895 0.77345538 0.76056338] mean value: 0.7745602456953206 key: test_precision value: [0.65384615 0.69230769 0.70833333 0.8 0.69230769 0.73076923 0.78571429 0.74074074 0.78947368 0.69565217] mean value: 0.7289144987142698 key: train_precision value: [0.74660633 0.75111111 0.74887892 0.74553571 0.74778761 0.75111111 0.75229358 0.73160173 0.73478261 0.73972603] mean value: 0.7449434751412146 key: test_recall value: [0.73913043 0.7826087 0.73913043 0.69565217 0.7826087 0.82608696 0.95652174 0.86956522 0.65217391 0.69565217] mean value: 0.7739130434782608 key: train_recall value: [0.79710145 0.81642512 0.80676329 0.80676329 0.81642512 0.81642512 0.79227053 0.81642512 0.81642512 0.7826087 ] mean value: 0.8067632850241546 key: test_roc_auc value: [0.67391304 0.7173913 0.7173913 0.76086957 0.7173913 0.76086957 0.84782609 0.7826087 0.73913043 0.69565217] mean value: 0.7413043478260869 key: train_roc_auc value: [0.76328502 0.77294686 0.76811594 0.76570048 0.7705314 0.77294686 0.76570048 0.75845411 0.76086957 0.75362319] mean value: 0.7652173913043478 key: test_jcc value: [0.53125 0.58064516 0.56666667 0.59259259 0.58064516 0.63333333 0.75862069 0.66666667 0.55555556 0.53333333] mean value: 0.5999309160383965 key: train_jcc value: [0.62737643 0.64258555 0.63498099 0.63257576 0.64015152 0.64258555 0.62835249 0.62825279 0.63059701 0.61363636] mean value: 0.6321094446924821 MCC on Blind test: 0.39 Accuracy on Blind test: 0.7 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01311207 0.01989889 0.01874471 0.01914358 0.01592255 0.01623392 0.02147722 0.03374481 0.03179073 0.01620793] mean value: 0.020627641677856447 key: score_time value: [0.0086832 0.01111603 0.01121593 0.01169395 0.01164913 0.01165795 0.01315379 0.01208568 0.01410437 0.01168466] mean value: 0.011704468727111816 key: test_mcc value: [0.60286056 0.35082321 0.2548236 0.48566186 0.56736651 0.53452248 0.70164642 0.56521739 0.65465367 0.34921515] mean value: 0.5066790855968193 key: train_mcc value: [0.66527661 0.7069807 0.59860241 0.73437665 0.71233921 0.62501968 0.66898551 0.77302805 0.66470211 0.33900469] mean value: 0.6488315629359197 key: test_accuracy value: [0.7826087 0.67391304 0.60869565 0.73913043 0.7826087 0.76086957 0.84782609 0.7826087 0.82608696 0.60869565] mean value: 0.741304347826087 key: train_accuracy value: [0.81884058 0.85024155 0.76811594 0.86714976 0.852657 0.80434783 0.82850242 0.88647343 0.82608696 0.61594203] mean value: 0.8118357487922705 key: test_fscore value: [0.81481481 0.69387755 0.47058824 0.71428571 0.79166667 0.73170732 0.85714286 0.7826087 0.83333333 0.35714286] mean value: 0.7047168042426114 key: train_fscore value: [0.84143763 0.83937824 0.70186335 0.86618005 0.86230248 0.77929155 0.84326711 0.88564477 0.84140969 0.39543726] mean value: 0.7856212140391424 key: test_precision value: [0.70967742 0.65384615 0.72727273 0.78947368 0.76 0.83333333 0.80769231 0.7826087 0.8 1. ] mean value: 0.7863904321362061 key: train_precision value: [0.7481203 0.90502793 0.9826087 0.87254902 0.80932203 0.89375 0.77642276 0.89215686 0.77327935 0.92857143] mean value: 0.8581808390641985 key: test_recall value: [0.95652174 0.73913043 0.34782609 0.65217391 0.82608696 0.65217391 0.91304348 0.7826087 0.86956522 0.2173913 ] mean value: 0.6956521739130435 key: train_recall value: [0.96135266 0.7826087 0.54589372 0.85990338 0.92270531 0.69082126 0.92270531 0.87922705 0.92270531 0.25120773] mean value: 0.7739130434782608 key: test_roc_auc value: [0.7826087 0.67391304 0.60869565 0.73913043 0.7826087 0.76086957 0.84782609 0.7826087 0.82608696 0.60869565] mean value: 0.741304347826087 key: train_roc_auc value: [0.81884058 0.85024155 0.76811594 0.86714976 0.852657 0.80434783 0.82850242 0.88647343 0.82608696 0.61594203] mean value: 0.8118357487922705 key: test_jcc value: [0.6875 0.53125 0.30769231 0.55555556 0.65517241 0.57692308 0.75 0.64285714 0.71428571 0.2173913 ] mean value: 0.5638627515454727 key: train_jcc value: [0.72627737 0.72321429 0.54066986 0.7639485 0.75793651 0.63839286 0.72900763 0.79475983 0.72623574 0.2464455 ] mean value: 0.6646888075360328 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02258778 0.02051401 0.02079272 0.02237797 0.02512527 0.01947999 0.02412772 0.02076626 0.02018929 0.02270794] mean value: 0.021866893768310545 key: score_time value: [0.01178718 0.01172805 0.01170397 0.01167297 0.01199746 0.01166201 0.01171231 0.01165533 0.01168275 0.01171136] mean value: 0.011731338500976563 key: test_mcc value: [0.62764591 0.45643546 0.65465367 0.69560834 0.52704628 0.47826087 0.42163702 0.53452248 0.56736651 0.57396402] mean value: 0.5537140580845729 key: train_mcc value: [0.67561452 0.76901382 0.76940988 0.77019104 0.5115435 0.73971411 0.41968603 0.67396079 0.77317244 0.76329393] mean value: 0.6865600066934703 key: test_accuracy value: [0.7826087 0.7173913 0.82608696 0.82608696 0.7173913 0.73913043 0.67391304 0.76086957 0.7826087 0.7826087 ] mean value: 0.7608695652173914 key: train_accuracy value: [0.82125604 0.88405797 0.88405797 0.87922705 0.71014493 0.86714976 0.64975845 0.81884058 0.88647343 0.88164251] mean value: 0.8282608695652174 key: test_fscore value: [0.72222222 0.75471698 0.81818182 0.78947368 0.60606061 0.73913043 0.74576271 0.73170732 0.77272727 0.76190476] mean value: 0.7441887810159469 key: train_fscore value: [0.78857143 0.88679245 0.88059701 0.86772487 0.59459459 0.87471526 0.74060823 0.78386167 0.88782816 0.88192771] mean value: 0.8187221394190056 key: test_precision value: [1. 0.66666667 0.85714286 1. 1. 0.73913043 0.61111111 0.83333333 0.80952381 0.84210526] mean value: 0.8359013475718281 key: train_precision value: [0.96503497 0.86635945 0.90769231 0.95906433 0.98876404 0.82758621 0.58806818 0.97142857 0.87735849 0.87980769] mean value: 0.8831164235178116 key: test_recall value: [0.56521739 0.86956522 0.7826087 0.65217391 0.43478261 0.73913043 0.95652174 0.65217391 0.73913043 0.69565217] mean value: 0.7086956521739131 key: train_recall value: [0.66666667 0.90821256 0.85507246 0.79227053 0.42512077 0.92753623 1. 0.65700483 0.89855072 0.88405797] mean value: 0.8014492753623188 key: test_roc_auc value: [0.7826087 0.7173913 0.82608696 0.82608696 0.7173913 0.73913043 0.67391304 0.76086957 0.7826087 0.7826087 ] mean value: 0.7608695652173912 key: train_roc_auc value: [0.82125604 0.88405797 0.88405797 0.87922705 0.71014493 0.86714976 0.64975845 0.81884058 0.88647343 0.88164251] mean value: 0.8282608695652174 key: test_jcc value: [0.56521739 0.60606061 0.69230769 0.65217391 0.43478261 0.5862069 0.59459459 0.57692308 0.62962963 0.61538462] mean value: 0.5953281024495417 key: train_jcc value: [0.6509434 0.79661017 0.78666667 0.76635514 0.42307692 0.77732794 0.58806818 0.64454976 0.79828326 0.7887931 ] mean value: 0.7020674540973326 MCC on Blind test: 0.38 Accuracy on Blind test: 0.7 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.17487621 0.16196847 0.16231751 0.16288209 0.16240931 0.16275215 0.16204476 0.16248679 0.16236567 0.16225672] mean value: 0.1636359691619873 key: score_time value: [0.01524496 0.01564002 0.01540661 0.01534367 0.01538467 0.01562142 0.01539731 0.01527524 0.0153079 0.01526213] mean value: 0.015388393402099609 key: test_mcc value: [0.78334945 0.74194083 0.65465367 0.6092718 0.87038828 0.78935222 0.82922798 0.61394061 0.66226618 0.91304348] mean value: 0.7467434495515832 key: train_mcc value: [0.9565329 0.98072211 0.99518069 0.99038439 0.96139753 0.9758568 0.96619485 0.99518069 0.99038439 0.97594791] mean value: 0.9787782268977475 key: test_accuracy value: [0.89130435 0.86956522 0.82608696 0.80434783 0.93478261 0.89130435 0.91304348 0.80434783 0.82608696 0.95652174] mean value: 0.8717391304347826 key: train_accuracy value: [0.97826087 0.99033816 0.99758454 0.99516908 0.98067633 0.98792271 0.98309179 0.99758454 0.99516908 0.98792271] mean value: 0.9893719806763285 key: test_fscore value: [0.88888889 0.875 0.83333333 0.8 0.93333333 0.89795918 0.91666667 0.81632653 0.80952381 0.95652174] mean value: 0.872755348516218 key: train_fscore value: [0.97831325 0.99038462 0.99757869 0.99519231 0.98076923 0.98795181 0.98313253 0.99757869 0.99519231 0.98800959] mean value: 0.989410302921394 key: test_precision value: [0.90909091 0.84 0.8 0.81818182 0.95454545 0.84615385 0.88 0.76923077 0.89473684 0.95652174] mean value: 0.8668461378438496 key: train_precision value: [0.97596154 0.98564593 1. 0.99043062 0.97607656 0.98557692 0.98076923 1. 0.99043062 0.98095238] mean value: 0.9865843805317489 key: test_recall value: [0.86956522 0.91304348 0.86956522 0.7826087 0.91304348 0.95652174 0.95652174 0.86956522 0.73913043 0.95652174] mean value: 0.8826086956521739 key: train_recall value: [0.98067633 0.99516908 0.99516908 1. 0.98550725 0.99033816 0.98550725 0.99516908 1. 0.99516908] mean value: 0.9922705314009662 key: test_roc_auc value: [0.89130435 0.86956522 0.82608696 0.80434783 0.93478261 0.89130435 0.91304348 0.80434783 0.82608696 0.95652174] mean value: 0.8717391304347826 key: train_roc_auc value: [0.97826087 0.99033816 0.99758454 0.99516908 0.98067633 0.98792271 0.98309179 0.99758454 0.99516908 0.98792271] mean value: 0.9893719806763285 key: test_jcc value: [0.8 0.77777778 0.71428571 0.66666667 0.875 0.81481481 0.84615385 0.68965517 0.68 0.91666667] mean value: 0.778102065877928 key: train_jcc value: [0.95754717 0.98095238 0.99516908 0.99043062 0.96226415 0.97619048 0.96682464 0.99516908 0.99043062 0.97630332] mean value: 0.9791281548253228 MCC on Blind test: 0.64 Accuracy on Blind test: 0.83 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.06977606 0.07538342 0.0865984 0.08907437 0.09228969 0.07616353 0.08706713 0.0904603 0.08801723 0.07870221] mean value: 0.08335323333740234 key: score_time value: [0.02260113 0.02591181 0.03777814 0.04038143 0.02711153 0.02351594 0.04033351 0.0242157 0.02397299 0.02499533] mean value: 0.02908174991607666 key: test_mcc value: [0.87038828 0.66226618 0.56521739 0.78935222 0.87038828 0.87038828 0.82922798 0.56521739 0.61394061 0.95742711] mean value: 0.759381372202104 key: train_mcc value: [0.97594791 0.9758568 0.98072211 0.98551875 0.96139753 0.98072211 0.98561076 0.96135266 0.97101449 0.97119583] mean value: 0.974933894100418 key: test_accuracy value: [0.93478261 0.82608696 0.7826087 0.89130435 0.93478261 0.93478261 0.91304348 0.7826087 0.80434783 0.97826087] mean value: 0.8782608695652174 key: train_accuracy value: [0.98792271 0.98792271 0.99033816 0.99275362 0.98067633 0.99033816 0.99275362 0.98067633 0.98550725 0.98550725] mean value: 0.9874396135265701 key: test_fscore value: [0.93617021 0.84 0.7826087 0.88372093 0.93333333 0.93617021 0.91666667 0.7826087 0.79069767 0.97777778] mean value: 0.8779754199265203 key: train_fscore value: [0.98800959 0.98789346 0.99029126 0.99273608 0.98076923 0.99029126 0.99270073 0.98067633 0.98550725 0.98536585] mean value: 0.9874241045783559 key: test_precision value: [0.91666667 0.77777778 0.7826087 0.95 0.95454545 0.91666667 0.88 0.7826087 0.85 1. ] mean value: 0.8810873956960914 key: train_precision value: [0.98095238 0.99029126 0.99512195 0.99514563 0.97607656 0.99512195 1. 0.98067633 0.98550725 0.99507389] mean value: 0.9893967198124055 key: test_recall value: [0.95652174 0.91304348 0.7826087 0.82608696 0.91304348 0.95652174 0.95652174 0.7826087 0.73913043 0.95652174] mean value: 0.8782608695652174 key: train_recall value: [0.99516908 0.98550725 0.98550725 0.99033816 0.98550725 0.98550725 0.98550725 0.98067633 0.98550725 0.97584541] mean value: 0.9855072463768116 key: test_roc_auc value: [0.93478261 0.82608696 0.7826087 0.89130435 0.93478261 0.93478261 0.91304348 0.7826087 0.80434783 0.97826087] mean value: 0.8782608695652174 key: train_roc_auc value: [0.98792271 0.98792271 0.99033816 0.99275362 0.98067633 0.99033816 0.99275362 0.98067633 0.98550725 0.98550725] mean value: 0.9874396135265701 key: test_jcc value: [0.88 0.72413793 0.64285714 0.79166667 0.875 0.88 0.84615385 0.64285714 0.65384615 0.95652174] mean value: 0.789304062254587 key: train_jcc value: [0.97630332 0.97607656 0.98076923 0.98557692 0.96226415 0.98076923 0.98550725 0.96208531 0.97142857 0.97115385] mean value: 0.975193438013435 MCC on Blind test: 0.68 Accuracy on Blind test: 0.84 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.12681055 0.12293315 0.11621809 0.12390661 0.1576407 0.11773539 0.14127946 0.11725688 0.15466142 0.18126297] mean value: 0.13597052097320556 key: score_time value: [0.02440906 0.01473045 0.02721024 0.02339292 0.02557945 0.02327824 0.03188467 0.02571344 0.02675581 0.02720642] mean value: 0.025016069412231445 key: test_mcc value: [ 0.34815531 -0.04637389 0.35082321 0.67556602 0.40533961 0.52623481 0.30905755 0.22075539 0.30550505 0.3927922 ] mean value: 0.3487855271090109 key: train_mcc value: [0.97613021 0.98085947 0.98561076 0.97613021 0.96673649 0.97142265 0.98085947 0.96673649 0.97142265 0.98561076] mean value: 0.9761519176586856 key: test_accuracy value: [0.67391304 0.47826087 0.67391304 0.82608696 0.69565217 0.76086957 0.65217391 0.60869565 0.65217391 0.69565217] mean value: 0.6717391304347826 key: train_accuracy value: [0.98792271 0.99033816 0.99275362 0.98792271 0.98309179 0.98550725 0.99033816 0.98309179 0.98550725 0.99275362] mean value: 0.9879227053140096 key: test_fscore value: [0.68085106 0.55555556 0.65116279 0.8 0.73076923 0.74418605 0.68 0.64 0.63636364 0.70833333] mean value: 0.6827221657060846 key: train_fscore value: [0.98806683 0.99043062 0.99280576 0.98806683 0.98337292 0.98571429 0.99043062 0.98337292 0.98571429 0.99280576] mean value: 0.9880780821020794 key: test_precision value: [0.66666667 0.48387097 0.7 0.94117647 0.65517241 0.8 0.62962963 0.59259259 0.66666667 0.68 ] mean value: 0.681577540767883 key: train_precision value: [0.97641509 0.98104265 0.98571429 0.97641509 0.96728972 0.97183099 0.98104265 0.96728972 0.97183099 0.98571429] mean value: 0.9764585479248011 key: test_recall value: [0.69565217 0.65217391 0.60869565 0.69565217 0.82608696 0.69565217 0.73913043 0.69565217 0.60869565 0.73913043] mean value: 0.6956521739130435 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.67391304 0.47826087 0.67391304 0.82608696 0.69565217 0.76086957 0.65217391 0.60869565 0.65217391 0.69565217] mean value: 0.6717391304347826 key: train_roc_auc value: [0.98792271 0.99033816 0.99275362 0.98792271 0.98309179 0.98550725 0.99033816 0.98309179 0.98550725 0.99275362] mean value: 0.9879227053140097 key: test_jcc value: [0.51612903 0.38461538 0.48275862 0.66666667 0.57575758 0.59259259 0.51515152 0.47058824 0.46666667 0.5483871 ] mean value: 0.5219313386466432 key: train_jcc value: [0.97641509 0.98104265 0.98571429 0.97641509 0.96728972 0.97183099 0.98104265 0.96728972 0.97183099 0.98571429] mean value: 0.9764585479248011 MCC on Blind test: 0.32 Accuracy on Blind test: 0.66 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.65061069 0.63841701 0.63427401 0.63759327 0.63641667 0.64082813 0.64226913 0.63239646 0.64211464 0.63958859] mean value: 0.6394508600234985 key: score_time value: [0.01019311 0.00947547 0.00929451 0.00928974 0.00994682 0.00945592 0.00938773 0.00934958 0.00958204 0.00936604] mean value: 0.009534096717834473 key: test_mcc value: [0.87038828 0.82922798 0.65465367 0.78935222 0.87038828 0.82922798 0.87705802 0.61394061 0.73913043 0.95742711] mean value: 0.803079458879572 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93478261 0.91304348 0.82608696 0.89130435 0.93478261 0.91304348 0.93478261 0.80434783 0.86956522 0.97826087] mean value: 0.9 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93617021 0.91666667 0.83333333 0.88372093 0.93333333 0.91666667 0.93877551 0.81632653 0.86956522 0.97777778] mean value: 0.9022336178983924 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.91666667 0.88 0.8 0.95 0.95454545 0.88 0.88461538 0.76923077 0.86956522 1. ] mean value: 0.8904623492449579 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.95652174 0.95652174 0.86956522 0.82608696 0.91304348 0.95652174 1. 0.86956522 0.86956522 0.95652174] mean value: 0.9173913043478261 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93478261 0.91304348 0.82608696 0.89130435 0.93478261 0.91304348 0.93478261 0.80434783 0.86956522 0.97826087] mean value: 0.9 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88 0.84615385 0.71428571 0.79166667 0.875 0.84615385 0.88461538 0.68965517 0.76923077 0.95652174] mean value: 0.8253283138650455 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.69 Accuracy on Blind test: 0.84 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.0364356 0.02996302 0.02851534 0.02908731 0.02904654 0.02905083 0.02902651 0.02826858 0.02909303 0.02910805] mean value: 0.029759478569030762 key: score_time value: [0.01263356 0.01271057 0.01395392 0.01488853 0.015131 0.0149157 0.01497221 0.01506925 0.01482534 0.01487589] mean value: 0.014397597312927246 key: test_mcc value: [0.22941573 0.15430335 0.31622777 0.12909944 0. 0.10540926 0.06052275 0.26413527 0.16439899 0.25819889] mean value: 0.1681711452279104 key: train_mcc value: [0.33601075 0.35786226 0.32249031 0.32249031 0.33154121 0.38316368 0.34484623 0.33154121 0.35786226 0.33601075] mean value: 0.3423818953924683 key: test_accuracy value: [0.58695652 0.54347826 0.63043478 0.54347826 0.5 0.54347826 0.52173913 0.56521739 0.56521739 0.58695652] mean value: 0.558695652173913 key: train_accuracy value: [0.60144928 0.61352657 0.5942029 0.5942029 0.59903382 0.62801932 0.60628019 0.59903382 0.61352657 0.60144928] mean value: 0.605072463768116 key: test_fscore value: [0.68852459 0.67692308 0.71186441 0.66666667 0.64615385 0.6440678 0.64516129 0.6969697 0.66666667 0.6984127 ] mean value: 0.6741410735668998 key: train_fscore value: [0.71502591 0.72125436 0.71134021 0.71134021 0.7137931 0.72887324 0.71750433 0.7137931 0.72125436 0.71502591] mean value: 0.7169204715732834 key: test_precision value: [0.55263158 0.52380952 0.58333333 0.525 0.5 0.52777778 0.51282051 0.53488372 0.54054054 0.55 ] mean value: 0.535079698815929 key: train_precision value: [0.55645161 0.5640327 0.552 0.552 0.55495979 0.5734072 0.55945946 0.55495979 0.5640327 0.55645161] mean value: 0.5587754853622922 key: test_recall value: [0.91304348 0.95652174 0.91304348 0.91304348 0.91304348 0.82608696 0.86956522 1. 0.86956522 0.95652174] mean value: 0.9130434782608695 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.58695652 0.54347826 0.63043478 0.54347826 0.5 0.54347826 0.52173913 0.56521739 0.56521739 0.58695652] mean value: 0.558695652173913 key: train_roc_auc value: [0.60144928 0.61352657 0.5942029 0.5942029 0.59903382 0.62801932 0.60628019 0.59903382 0.61352657 0.60144928] mean value: 0.605072463768116 key: test_jcc value: [0.525 0.51162791 0.55263158 0.5 0.47727273 0.475 0.47619048 0.53488372 0.5 0.53658537] mean value: 0.5089191776171207 key: train_jcc value: [0.55645161 0.5640327 0.552 0.552 0.55495979 0.5734072 0.55945946 0.55495979 0.5640327 0.55645161] mean value: 0.5587754853622922 MCC on Blind test: 0.04 Accuracy on Blind test: 0.46 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02925348 0.03742146 0.03198218 0.05191994 0.03705192 0.0369761 0.03693676 0.03724885 0.0375576 0.03748274] mean value: 0.037383103370666505 key: score_time value: [0.02401781 0.02413416 0.02064085 0.0163424 0.02239585 0.02346134 0.02073121 0.02163601 0.02227831 0.02286959] mean value: 0.02185075283050537 key: test_mcc value: [0.56736651 0.41736501 0.6092718 0.6092718 0.52223297 0.43519414 0.57396402 0.62360956 0.6092718 0.74194083] mean value: 0.5709488428748447 key: train_mcc value: [0.76529696 0.78415661 0.78876611 0.77404053 0.779088 0.78526132 0.76901382 0.77956276 0.77447567 0.75973177] mean value: 0.7759393536743577 key: test_accuracy value: [0.7826087 0.69565217 0.80434783 0.80434783 0.76086957 0.7173913 0.7826087 0.80434783 0.80434783 0.86956522] mean value: 0.782608695652174 key: train_accuracy value: [0.88164251 0.89130435 0.89371981 0.88647343 0.88888889 0.89130435 0.88405797 0.88888889 0.88647343 0.87922705] mean value: 0.8871980676328503 key: test_fscore value: [0.79166667 0.74074074 0.80851064 0.8 0.76595745 0.71111111 0.8 0.82352941 0.80851064 0.86363636] mean value: 0.7913663017323843 key: train_fscore value: [0.88578089 0.89461358 0.89671362 0.88941176 0.89201878 0.89559165 0.88679245 0.89252336 0.88992974 0.88262911] mean value: 0.8906004943009075 key: test_precision value: [0.76 0.64516129 0.79166667 0.81818182 0.75 0.72727273 0.74074074 0.75 0.79166667 0.9047619 ] mean value: 0.7679451814613105 key: train_precision value: [0.85585586 0.86818182 0.87214612 0.86697248 0.86757991 0.86160714 0.86635945 0.86425339 0.86363636 0.85844749] mean value: 0.8645040014246903 key: test_recall value: [0.82608696 0.86956522 0.82608696 0.7826087 0.7826087 0.69565217 0.86956522 0.91304348 0.82608696 0.82608696] mean value: 0.8217391304347826 key: train_recall value: [0.9178744 0.92270531 0.92270531 0.91304348 0.9178744 0.93236715 0.90821256 0.92270531 0.9178744 0.90821256] mean value: 0.9183574879227053 key: test_roc_auc value: [0.7826087 0.69565217 0.80434783 0.80434783 0.76086957 0.7173913 0.7826087 0.80434783 0.80434783 0.86956522] mean value: 0.782608695652174 key: train_roc_auc value: [0.88164251 0.89130435 0.89371981 0.88647343 0.88888889 0.89130435 0.88405797 0.88888889 0.88647343 0.87922705] mean value: 0.8871980676328503 key: test_jcc value: [0.65517241 0.58823529 0.67857143 0.66666667 0.62068966 0.55172414 0.66666667 0.7 0.67857143 0.76 ] mean value: 0.6566297691490389 key: train_jcc value: [0.79497908 0.80932203 0.81276596 0.80084746 0.80508475 0.81092437 0.79661017 0.80590717 0.80168776 0.78991597] mean value: 0.8028044716567692 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.25978518 0.2757678 0.26558924 0.27017879 0.34030342 0.40387106 0.33534956 0.35011554 0.27279735 0.26528931] mean value: 0.30390472412109376 key: score_time value: [0.02251339 0.02069664 0.0212121 0.02095771 0.02273417 0.02089667 0.02371645 0.02372837 0.02070165 0.02153325] mean value: 0.02186903953552246 key: test_mcc value: [0.54772256 0.32461723 0.6092718 0.43519414 0.52223297 0.43519414 0.57396402 0.62360956 0.6092718 0.74194083] mean value: 0.5423019036532302 key: train_mcc value: [0.68663964 0.83610009 0.78876611 0.71565259 0.779088 0.78526132 0.76901382 0.77956276 0.77447567 0.75973177] mean value: 0.7674291764947905 key: test_accuracy value: [0.76086957 0.65217391 0.80434783 0.7173913 0.76086957 0.7173913 0.7826087 0.80434783 0.80434783 0.86956522] mean value: 0.7673913043478261 key: train_accuracy value: [0.84299517 0.9178744 0.89371981 0.85748792 0.88888889 0.89130435 0.88405797 0.88888889 0.88647343 0.87922705] mean value: 0.8830917874396135 key: test_fscore value: [0.79245283 0.7037037 0.80851064 0.71111111 0.76595745 0.71111111 0.8 0.82352941 0.80851064 0.86363636] mean value: 0.778852325491993 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_7030.py:176: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_7030.py:179: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.8463357 0.91904762 0.89671362 0.86052009 0.89201878 0.89559165 0.88679245 0.89252336 0.88992974 0.88262911] mean value: 0.8862102120393928 key: test_precision value: [0.7 0.61290323 0.79166667 0.72727273 0.75 0.72727273 0.74074074 0.75 0.79166667 0.9047619 ] mean value: 0.7496284659187885 key: train_precision value: [0.8287037 0.90610329 0.87214612 0.84259259 0.86757991 0.86160714 0.86635945 0.86425339 0.86363636 0.85844749] mean value: 0.8631429445826281 key: test_recall value: [0.91304348 0.82608696 0.82608696 0.69565217 0.7826087 0.69565217 0.86956522 0.91304348 0.82608696 0.82608696] mean value: 0.817391304347826 key: train_recall value: [0.8647343 0.93236715 0.92270531 0.87922705 0.9178744 0.93236715 0.90821256 0.92270531 0.9178744 0.90821256] mean value: 0.9106280193236715 key: test_roc_auc value: [0.76086957 0.65217391 0.80434783 0.7173913 0.76086957 0.7173913 0.7826087 0.80434783 0.80434783 0.86956522] mean value: 0.7673913043478261 key: train_roc_auc value: [0.84299517 0.9178744 0.89371981 0.85748792 0.88888889 0.89130435 0.88405797 0.88888889 0.88647343 0.87922705] mean value: 0.8830917874396136 key: test_jcc value: [0.65625 0.54285714 0.67857143 0.55172414 0.62068966 0.55172414 0.66666667 0.7 0.67857143 0.76 ] mean value: 0.640705459770115 key: train_jcc value: [0.73360656 0.85022026 0.81276596 0.75518672 0.80508475 0.81092437 0.79661017 0.80590717 0.80168776 0.78991597] mean value: 0.7961909689230291 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.05583358 0.03908515 0.05536318 0.1265595 0.0895493 0.03901505 0.03782654 0.03889012 0.03748798 0.03715825] mean value: 0.05567686557769776 key: score_time value: [0.01862597 0.01223898 0.01556611 0.01722574 0.01222587 0.01617336 0.01213646 0.01217461 0.01655126 0.01313186] mean value: 0.014605021476745606 key: test_mcc value: [0.438357 0.59404013 0.56360186 0.71910121 0.66853948 0.61895161 0.52419355 0.58770161 0.55544355 0.6385282 ] mean value: 0.5908458209912425 key: train_mcc value: [0.7266021 0.72302514 0.70930045 0.71271957 0.72152762 0.6778952 0.71725106 0.70964977 0.72431729 0.73162292] mean value: 0.7153911111468142 key: test_accuracy value: [0.71875 0.796875 0.78125 0.859375 0.82539683 0.80952381 0.76190476 0.79365079 0.77777778 0.80952381] mean value: 0.7934027777777778 key: train_accuracy value: [0.86315789 0.86140351 0.85438596 0.85614035 0.85989492 0.83887916 0.85814361 0.85464098 0.86164623 0.86514886] mean value: 0.8573441484622238 key: test_fscore value: [0.70967742 0.8 0.77419355 0.85714286 0.84057971 0.80645161 0.76190476 0.79365079 0.78125 0.83333333] mean value: 0.7958184036821835 key: train_fscore value: [0.8650519 0.86308492 0.85714286 0.85862069 0.86486486 0.84083045 0.86201022 0.8566494 0.86495726 0.86882453] mean value: 0.8602037100062495 key: test_precision value: [0.73333333 0.78787879 0.8 0.87096774 0.76315789 0.80645161 0.75 0.80645161 0.78125 0.75 ] mean value: 0.7849490983690899 key: train_precision value: [0.85324232 0.85273973 0.84121622 0.8440678 0.83660131 0.83219178 0.84053156 0.84353741 0.84333333 0.84437086] mean value: 0.8431832318372622 key: test_recall value: [0.6875 0.8125 0.75 0.84375 0.93548387 0.80645161 0.77419355 0.78125 0.78125 0.9375 ] mean value: 0.8109879032258065 key: train_recall value: [0.87719298 0.87368421 0.87368421 0.87368421 0.8951049 0.84965035 0.88461538 0.87017544 0.8877193 0.89473684] mean value: 0.8780247822353086 key: test_roc_auc value: [0.71875 0.796875 0.78125 0.859375 0.82711694 0.80947581 0.76209677 0.79385081 0.77772177 0.80745968] mean value: 0.7933971774193548 key: train_roc_auc value: [0.86315789 0.86140351 0.85438596 0.85614035 0.85983315 0.83886026 0.85809717 0.85466814 0.86169182 0.86520059] mean value: 0.8573438841859895 key: test_jcc value: [0.55 0.66666667 0.63157895 0.75 0.725 0.67567568 0.61538462 0.65789474 0.64102564 0.71428571] mean value: 0.6627511997248839 key: train_jcc value: [0.76219512 0.75914634 0.75 0.75226586 0.76190476 0.72537313 0.75748503 0.74924471 0.76204819 0.76807229] mean value: 0.7547735445533712 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.82782364 0.99994683 0.83809996 0.88531876 1.0512445 0.87437463 0.99219942 0.85140181 0.94379377 0.99876714] mean value: 0.9262970447540283 key: score_time value: [0.01321125 0.01522207 0.01573372 0.01344991 0.02312303 0.01547813 0.01549244 0.01336527 0.01602554 0.01229239] mean value: 0.015339374542236328 key: test_mcc value: [0.62622429 0.46897905 0.6644106 0.72192954 0.69609023 0.69609023 0.58770161 0.81644514 0.71443023 0.60087592] mean value: 0.6593176855565691 key: train_mcc value: [0.84298269 0.86700831 0.85614562 0.88850724 0.87783724 0.8706586 0.8708281 0.87444928 0.86430436 0.8887091 ] mean value: 0.8701430540050656 key: test_accuracy value: [0.8125 0.734375 0.828125 0.859375 0.84126984 0.84126984 0.79365079 0.9047619 0.85714286 0.79365079] mean value: 0.8266121031746032 key: train_accuracy value: [0.92105263 0.93333333 0.92807018 0.94385965 0.93870403 0.9352014 0.9352014 0.93695271 0.93169877 0.94395797] mean value: 0.9348032076689096 key: test_fscore value: [0.81818182 0.73846154 0.81355932 0.86567164 0.85294118 0.85294118 0.79365079 0.9 0.86153846 0.81690141] mean value: 0.8313847337049435 key: train_fscore value: [0.92281304 0.93425606 0.92819615 0.94501718 0.93975904 0.93609672 0.9363167 0.93793103 0.93310463 0.94501718] mean value: 0.935850771843356 key: test_precision value: [0.79411765 0.72727273 0.88888889 0.82857143 0.78378378 0.78378378 0.78125 0.96428571 0.84848485 0.74358974] mean value: 0.8144028565719742 key: train_precision value: [0.90268456 0.92150171 0.92657343 0.92592593 0.92542373 0.92491468 0.9220339 0.9220339 0.91275168 0.92592593] mean value: 0.9209769427712305 key: test_recall value: [0.84375 0.75 0.75 0.90625 0.93548387 0.93548387 0.80645161 0.84375 0.875 0.90625 ] mean value: 0.855241935483871 key: train_recall value: [0.94385965 0.94736842 0.92982456 0.96491228 0.95454545 0.94755245 0.95104895 0.95438596 0.95438596 0.96491228] mean value: 0.951279597595387 key: test_roc_auc value: [0.8125 0.734375 0.828125 0.859375 0.84274194 0.84274194 0.79385081 0.90574597 0.85685484 0.79183468] mean value: 0.8268145161290322 key: train_roc_auc value: [0.92105263 0.93333333 0.92807018 0.94385965 0.93867624 0.93517973 0.9351736 0.93698319 0.93173844 0.9439946 ] mean value: 0.9348061587535271 key: test_jcc value: [0.69230769 0.58536585 0.68571429 0.76315789 0.74358974 0.74358974 0.65789474 0.81818182 0.75675676 0.69047619] mean value: 0.7137034715853715 key: train_jcc value: [0.8566879 0.87662338 0.86601307 0.89576547 0.88636364 0.87987013 0.8802589 0.88311688 0.87459807 0.89576547] mean value: 0.8795062910999956 MCC on Blind test: 0.43 Accuracy on Blind test: 0.72 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01493692 0.01272726 0.01099396 0.01064897 0.01025271 0.0109911 0.01050425 0.01111269 0.01149487 0.01131725] mean value: 0.011497998237609863 key: score_time value: [0.01239252 0.01026893 0.00940967 0.00892758 0.00909019 0.00944972 0.00938582 0.00955534 0.00960016 0.00955343] mean value: 0.009763336181640625 key: test_mcc value: [0.37573457 0.47082362 0.438357 0.50097943 0.33569416 0.40327957 0.36491935 0.5570134 0.39656932 0.27016129] mean value: 0.4113531727056431 key: train_mcc value: [0.48791156 0.48146107 0.49533484 0.4881522 0.48862061 0.48523603 0.50266258 0.51676791 0.47481655 0.40924848] mean value: 0.48302118291664714 key: test_accuracy value: [0.6875 0.734375 0.71875 0.75 0.66666667 0.6984127 0.68253968 0.77777778 0.6984127 0.63492063] mean value: 0.7049355158730158 key: train_accuracy value: [0.74385965 0.74035088 0.74736842 0.74385965 0.74430823 0.74255692 0.75131349 0.75831874 0.73730298 0.69527145] mean value: 0.7404510400344118 key: test_fscore value: [0.67741935 0.72131148 0.70967742 0.75757576 0.67692308 0.71641791 0.67741935 0.77419355 0.70769231 0.63492063] mean value: 0.7053550840388729 key: train_fscore value: [0.74740484 0.74744027 0.75342466 0.74914089 0.7456446 0.74611399 0.75347222 0.76041667 0.74048443 0.64049587] mean value: 0.7384038442996906 key: test_precision value: [0.7 0.75862069 0.73333333 0.73529412 0.64705882 0.66666667 0.67741935 0.8 0.6969697 0.64516129] mean value: 0.706052397296263 key: train_precision value: [0.73720137 0.72757475 0.73578595 0.73400673 0.74305556 0.73720137 0.74827586 0.75257732 0.73037543 0.77889447] mean value: 0.7424948804585102 key: test_recall value: [0.65625 0.6875 0.6875 0.78125 0.70967742 0.77419355 0.67741935 0.75 0.71875 0.625 ] mean value: 0.7067540322580645 key: train_recall value: [0.75789474 0.76842105 0.77192982 0.76491228 0.74825175 0.75524476 0.75874126 0.76842105 0.75087719 0.54385965] mean value: 0.7388553551711446 key: test_roc_auc value: [0.6875 0.734375 0.71875 0.75 0.66733871 0.69959677 0.68245968 0.77822581 0.69808468 0.63508065] mean value: 0.705141129032258 key: train_roc_auc value: [0.74385965 0.74035088 0.74736842 0.74385965 0.74430131 0.74253466 0.75130045 0.7583364 0.73732671 0.69500675] mean value: 0.7404244877929089 key: test_jcc value: [0.51219512 0.56410256 0.55 0.6097561 0.51162791 0.55813953 0.51219512 0.63157895 0.54761905 0.46511628] mean value: 0.546233062148368 key: train_jcc value: [0.59668508 0.59673025 0.6043956 0.5989011 0.59444444 0.59504132 0.60445682 0.61344538 0.58791209 0.47112462] mean value: 0.5863136708796407 MCC on Blind test: 0.42 Accuracy on Blind test: 0.71 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01242709 0.01062965 0.01044965 0.01063824 0.01059484 0.01070356 0.01121044 0.01069164 0.01062679 0.01107025] mean value: 0.010904216766357422 key: score_time value: [0.00974107 0.00936651 0.00907683 0.00916839 0.00917578 0.00919271 0.00908542 0.00898957 0.00906658 0.00909472] mean value: 0.009195756912231446 key: test_mcc value: [0.4113018 0.34391797 0.56694671 0.6011334 0.52679717 0.42986904 0.39656932 0.42986904 0.40025188 0.38166127] mean value: 0.44883175921777024 key: train_mcc value: [0.55727849 0.55087719 0.55689066 0.54886043 0.51705741 0.53466669 0.55324733 0.5320108 0.51729972 0.55982989] mean value: 0.542801862027514 key: test_accuracy value: [0.703125 0.671875 0.78125 0.796875 0.76190476 0.71428571 0.6984127 0.71428571 0.6984127 0.68253968] mean value: 0.722296626984127 key: train_accuracy value: [0.77719298 0.7754386 0.77719298 0.77368421 0.75831874 0.76707531 0.77583187 0.76532399 0.75831874 0.7793345 ] mean value: 0.7707711924294097 key: test_fscore value: [0.6779661 0.66666667 0.79411765 0.8115942 0.76923077 0.71875 0.68852459 0.70967742 0.72463768 0.72972973] mean value: 0.7290894807957649 key: train_fscore value: [0.78797997 0.7754386 0.78726968 0.78172589 0.76369863 0.77264957 0.78451178 0.77288136 0.76369863 0.78571429] mean value: 0.7775568392250982 key: test_precision value: [0.74074074 0.67741935 0.75 0.75675676 0.73529412 0.6969697 0.7 0.73333333 0.67567568 0.64285714] mean value: 0.7109046818819115 key: train_precision value: [0.75159236 0.7754386 0.75320513 0.75490196 0.74832215 0.75585284 0.75649351 0.74754098 0.7458194 0.76237624] mean value: 0.7551543158346077 key: test_recall value: [0.625 0.65625 0.84375 0.875 0.80645161 0.74193548 0.67741935 0.6875 0.78125 0.84375 ] mean value: 0.7538306451612903 key: train_recall value: [0.82807018 0.7754386 0.8245614 0.81052632 0.77972028 0.79020979 0.81468531 0.8 0.78245614 0.81052632] mean value: 0.8016194331983806 key: test_roc_auc value: [0.703125 0.671875 0.78125 0.796875 0.76260081 0.71471774 0.69808468 0.71471774 0.69707661 0.67993952] mean value: 0.7220262096774194 key: train_roc_auc value: [0.77719298 0.7754386 0.77719298 0.77368421 0.75828119 0.76703472 0.77576371 0.76538462 0.75836094 0.77938903] mean value: 0.7707722978775611 key: test_jcc value: [0.51282051 0.5 0.65853659 0.68292683 0.625 0.56097561 0.525 0.55 0.56818182 0.57446809] mean value: 0.5757909440498958 key: train_jcc value: [0.65013774 0.63323782 0.64917127 0.64166667 0.61772853 0.62952646 0.64542936 0.62983425 0.61772853 0.64705882] mean value: 0.63615194674427 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00994682 0.01091051 0.00977302 0.00970244 0.00984812 0.0096705 0.00967932 0.01085186 0.0097518 0.01083469] mean value: 0.010096907615661621 key: score_time value: [0.01667738 0.01324749 0.01305437 0.01258993 0.01238918 0.01241064 0.01239824 0.0128274 0.01290751 0.0134058 ] mean value: 0.013190793991088866 key: test_mcc value: [0.40644851 0.25048972 0.21971769 0.4375 0.40327957 0.14664712 0.46068548 0.17439516 0.23761484 0.20588616] mean value: 0.2942664249154479 key: train_mcc value: [0.57138681 0.60711092 0.61328465 0.58881348 0.6061167 0.61431658 0.61350495 0.61733573 0.60786984 0.59184699] mean value: 0.6031586655181542 key: test_accuracy value: [0.703125 0.625 0.609375 0.71875 0.6984127 0.57142857 0.73015873 0.58730159 0.61904762 0.6031746 ] mean value: 0.6465773809523809 key: train_accuracy value: [0.78421053 0.80175439 0.80526316 0.79298246 0.80210158 0.8056042 0.8056042 0.80735552 0.80210158 0.79509632] mean value: 0.800207392386395 key: test_fscore value: [0.6984127 0.63636364 0.62686567 0.71875 0.71641791 0.59701493 0.73015873 0.59375 0.63636364 0.62686567] mean value: 0.6580962880403178 key: train_fscore value: [0.79465776 0.81198003 0.81407035 0.80267559 0.81008403 0.81530782 0.81407035 0.81543624 0.81198003 0.80203046] mean value: 0.8092292670672316 key: test_precision value: [0.70967742 0.61764706 0.6 0.71875 0.66666667 0.55555556 0.71875 0.59375 0.61764706 0.6 ] mean value: 0.639844375922412 key: train_precision value: [0.75796178 0.7721519 0.77884615 0.76677316 0.77993528 0.77777778 0.78135048 0.78135048 0.7721519 0.7745098 ] mean value: 0.7742808719103773 key: test_recall value: [0.6875 0.65625 0.65625 0.71875 0.77419355 0.64516129 0.74193548 0.59375 0.65625 0.65625 ] mean value: 0.6786290322580645 key: train_recall value: [0.83508772 0.85614035 0.85263158 0.84210526 0.84265734 0.85664336 0.84965035 0.85263158 0.85614035 0.83157895] mean value: 0.8475266838424733 key: test_roc_auc value: [0.703125 0.625 0.609375 0.71875 0.69959677 0.57258065 0.73034274 0.58719758 0.61844758 0.60231855] mean value: 0.6466733870967741 key: train_roc_auc value: [0.78421053 0.80175439 0.80526316 0.79298246 0.80203043 0.80551466 0.80552693 0.80743467 0.80219605 0.7951601 ] mean value: 0.800207336523126 key: test_jcc value: [0.53658537 0.46666667 0.45652174 0.56097561 0.55813953 0.42553191 0.575 0.42222222 0.46666667 0.45652174] mean value: 0.4924831459203519 key: train_jcc value: [0.65927978 0.68347339 0.68644068 0.67039106 0.68079096 0.68820225 0.68644068 0.68838527 0.68347339 0.66949153] mean value: 0.6796368976678084 MCC on Blind test: 0.25 Accuracy on Blind test: 0.63 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.02824759 0.03029752 0.03228378 0.03044891 0.03006387 0.02939272 0.03179431 0.03097677 0.02952504 0.03144264] mean value: 0.03044731616973877 key: score_time value: [0.01378345 0.0138123 0.01368737 0.014395 0.01452589 0.01469469 0.01457 0.01469707 0.01411414 0.01394987] mean value: 0.014222979545593262 key: test_mcc value: [0.5625 0.5625 0.5 0.65915306 0.63159952 0.42986904 0.58728587 0.5570134 0.46146899 0.56449867] mean value: 0.5515888547643149 key: train_mcc value: [0.70737384 0.67019194 0.70637719 0.66638172 0.69418033 0.69378974 0.68851772 0.69288686 0.70502882 0.68838037] mean value: 0.6913108540752241 key: test_accuracy value: [0.78125 0.78125 0.75 0.828125 0.80952381 0.71428571 0.79365079 0.77777778 0.73015873 0.77777778] mean value: 0.7743799603174604 key: train_accuracy value: [0.85263158 0.83508772 0.85263158 0.83157895 0.84588441 0.84588441 0.84238179 0.84588441 0.85113835 0.8441331 ] mean value: 0.8447236304421298 key: test_fscore value: [0.78125 0.78125 0.75 0.8358209 0.82352941 0.71875 0.78688525 0.77419355 0.74626866 0.8 ] mean value: 0.7797947758292249 key: train_fscore value: [0.85810811 0.83566434 0.85665529 0.83946488 0.85234899 0.85185185 0.85049834 0.84982935 0.85714286 0.84521739] mean value: 0.8496781400811892 key: test_precision value: [0.78125 0.78125 0.75 0.8 0.75675676 0.6969697 0.8 0.8 0.71428571 0.73684211] mean value: 0.7617354273275326 key: train_precision value: [0.82736156 0.83275261 0.83388704 0.80191693 0.81935484 0.82142857 0.81012658 0.82724252 0.82258065 0.83793103] mean value: 0.8234582349832773 key: test_recall value: [0.78125 0.78125 0.75 0.875 0.90322581 0.74193548 0.77419355 0.75 0.78125 0.875 ] mean value: 0.8013104838709677 key: train_recall value: [0.89122807 0.83859649 0.88070175 0.88070175 0.88811189 0.88461538 0.8951049 0.87368421 0.89473684 0.85263158] mean value: 0.8780112869586554 key: test_roc_auc value: [0.78125 0.78125 0.75 0.828125 0.8109879 0.71471774 0.79334677 0.77822581 0.72933468 0.77620968] mean value: 0.7743447580645161 key: train_roc_auc value: [0.85263158 0.83508772 0.85263158 0.83157895 0.84581033 0.84581646 0.84228929 0.84593301 0.85121457 0.84414796] mean value: 0.8447141455036192 key: test_jcc value: [0.64102564 0.64102564 0.6 0.71794872 0.7 0.56097561 0.64864865 0.63157895 0.5952381 0.66666667] mean value: 0.6403107967677929 key: train_jcc value: [0.75147929 0.71771772 0.74925373 0.72334294 0.74269006 0.74193548 0.73988439 0.7388724 0.75 0.73192771] mean value: 0.7387103728301385 MCC on Blind test: 0.41 Accuracy on Blind test: 0.71 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.97375131 2.03104806 1.21565342 1.63651228 2.01255965 2.13570857 2.18256807 2.14106631 2.27216601 2.22363091] mean value: 1.982466459274292 key: score_time value: [0.01512861 0.01351404 0.0125792 0.01248193 0.01263905 0.01541376 0.01556706 0.01482749 0.014992 0.01525855] mean value: 0.014240169525146484 key: test_mcc value: [0.59404013 0.50097943 0.625 0.65915306 0.61445255 0.55544355 0.65085805 0.62475802 0.69429215 0.58371723] mean value: 0.6102694177898594 key: train_mcc value: [0.9579891 0.96493604 0.86803566 0.9444412 0.95517197 0.95806136 0.9719789 0.9650692 0.9650692 0.9581815 ] mean value: 0.950893411855093 key: test_accuracy value: [0.796875 0.75 0.8125 0.828125 0.79365079 0.77777778 0.82539683 0.80952381 0.84126984 0.77777778] mean value: 0.8012896825396826 key: train_accuracy value: [0.97894737 0.98245614 0.93333333 0.97192982 0.97723292 0.97898424 0.98598949 0.98248687 0.98248687 0.97898424] mean value: 0.9752831290134267 key: test_fscore value: [0.8 0.74193548 0.8125 0.8358209 0.81690141 0.77419355 0.81967213 0.8 0.85714286 0.81081081] mean value: 0.8068977135332366 key: train_fscore value: [0.97909408 0.98239437 0.93515358 0.97241379 0.97770154 0.97916667 0.98601399 0.9825784 0.9825784 0.97916667] mean value: 0.9756261477085117 key: test_precision value: [0.78787879 0.76666667 0.8125 0.8 0.725 0.77419355 0.83333333 0.85714286 0.78947368 0.71428571] mean value: 0.7860474591904982 key: train_precision value: [0.97231834 0.98586572 0.910299 0.9559322 0.95959596 0.97241379 0.98601399 0.97577855 0.97577855 0.96907216] mean value: 0.9663068267281514 key: test_recall value: [0.8125 0.71875 0.8125 0.875 0.93548387 0.77419355 0.80645161 0.75 0.9375 0.9375 ] mean value: 0.8359879032258064 key: train_recall value: [0.98596491 0.97894737 0.96140351 0.98947368 0.9965035 0.98601399 0.98601399 0.98947368 0.98947368 0.98947368] mean value: 0.9852741994847258 key: test_roc_auc value: [0.796875 0.75 0.8125 0.828125 0.79586694 0.77772177 0.82510081 0.81048387 0.83971774 0.77520161] mean value: 0.8011592741935484 key: train_roc_auc value: [0.97894737 0.98245614 0.93333333 0.97192982 0.97719912 0.97897191 0.98598945 0.98249908 0.98249908 0.97900258] mean value: 0.9752827873880505 key: test_jcc value: [0.66666667 0.58974359 0.68421053 0.71794872 0.69047619 0.63157895 0.69444444 0.66666667 0.75 0.68181818] mean value: 0.6773553931448668 key: train_jcc value: [0.95904437 0.96539792 0.87820513 0.94630872 0.95637584 0.95918367 0.97241379 0.96575342 0.96575342 0.95918367] mean value: 0.9527619973796925 MCC on Blind test: 0.41 Accuracy on Blind test: 0.72 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.04272532 0.03708005 0.03343678 0.0325079 0.0369761 0.03386879 0.03489923 0.03392696 0.03346109 0.03364778] mean value: 0.03525300025939941 key: score_time value: [0.0112803 0.00947237 0.00918174 0.00996089 0.0099721 0.00926304 0.00975418 0.00947094 0.0093956 0.00996923] mean value: 0.009772038459777832 key: test_mcc value: [0.84748251 0.75592895 0.71910121 0.78163175 0.79833297 0.74596774 0.80947581 0.74772995 0.75156646 0.52371369] mean value: 0.7480931026695484 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.921875 0.875 0.859375 0.890625 0.88888889 0.87301587 0.9047619 0.87301587 0.87301587 0.76190476] mean value: 0.8721478174603174 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.92537313 0.86666667 0.85714286 0.88888889 0.89855072 0.87096774 0.90322581 0.87096774 0.88235294 0.76923077] mean value: 0.8733367272394272 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88571429 0.92857143 0.87096774 0.90322581 0.81578947 0.87096774 0.90322581 0.9 0.83333333 0.75757576] mean value: 0.866937137565321 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96875 0.8125 0.84375 0.875 1. 0.87096774 0.90322581 0.84375 0.9375 0.78125 ] mean value: 0.8836693548387097 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.921875 0.875 0.859375 0.890625 0.890625 0.87298387 0.9047379 0.8734879 0.87197581 0.76159274] mean value: 0.8722278225806451 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.86111111 0.76470588 0.75 0.8 0.81578947 0.77142857 0.82352941 0.77142857 0.78947368 0.625 ] mean value: 0.7772466705980638 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.63 Accuracy on Blind test: 0.82 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.1539166 0.15231609 0.15183663 0.15490413 0.15403032 0.14877081 0.15526795 0.15582466 0.15328646 0.15435219] mean value: 0.1534505844116211 key: score_time value: [0.01990318 0.02001739 0.01951146 0.02007031 0.01984763 0.01978254 0.0202117 0.02009964 0.01998091 0.01975799] mean value: 0.019918274879455567 key: test_mcc value: [0.71910121 0.5625 0.62622429 0.68884672 0.63159952 0.53159579 0.61982085 0.68245968 0.55544355 0.56449867] mean value: 0.6182090274709215 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.859375 0.78125 0.8125 0.84375 0.80952381 0.76190476 0.80952381 0.84126984 0.77777778 0.77777778] mean value: 0.8074652777777778 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.86153846 0.78125 0.81818182 0.84848485 0.82352941 0.7761194 0.8 0.84375 0.78125 0.8 ] mean value: 0.8134103942954909 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.84848485 0.78125 0.79411765 0.82352941 0.75675676 0.72222222 0.82758621 0.84375 0.78125 0.73684211] mean value: 0.7915789198447066 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.875 0.78125 0.84375 0.875 0.90322581 0.83870968 0.77419355 0.84375 0.78125 0.875 ] mean value: 0.8391129032258065 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.859375 0.78125 0.8125 0.84375 0.8109879 0.76310484 0.80897177 0.84122984 0.77772177 0.77620968] mean value: 0.8075100806451613 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75675676 0.64102564 0.69230769 0.73684211 0.7 0.63414634 0.66666667 0.72972973 0.64102564 0.66666667] mean value: 0.6865167240905367 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.4 Accuracy on Blind test: 0.71 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01775193 0.01068568 0.01076674 0.01071692 0.01126504 0.01195621 0.01207042 0.01220584 0.01247931 0.01210117] mean value: 0.012199926376342773 key: score_time value: [0.00915742 0.00898027 0.00895357 0.00890779 0.00897694 0.00968862 0.00974393 0.00967693 0.00982428 0.00974512] mean value: 0.009365487098693847 key: test_mcc value: [0.4375 0.56694671 0.50097943 0.53150959 0.5026181 0.68415777 0.39717742 0.56086231 0.46309616 0.35186681] mean value: 0.4996714294281916 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.71875 0.78125 0.75 0.765625 0.74603175 0.84126984 0.6984127 0.77777778 0.73015873 0.66666667] mean value: 0.7475942460317461 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.71875 0.76666667 0.75757576 0.76190476 0.76470588 0.84375 0.6984127 0.76666667 0.72131148 0.72 ] mean value: 0.7519743908989328 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.71875 0.82142857 0.73529412 0.77419355 0.7027027 0.81818182 0.6875 0.82142857 0.75862069 0.62790698] mean value: 0.7466006996175177 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.71875 0.71875 0.78125 0.75 0.83870968 0.87096774 0.70967742 0.71875 0.6875 0.84375 ] mean value: 0.7638104838709677 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.71875 0.78125 0.75 0.765625 0.74747984 0.84173387 0.69858871 0.77872984 0.73084677 0.66381048] mean value: 0.7476814516129032 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.56097561 0.62162162 0.6097561 0.61538462 0.61904762 0.72972973 0.53658537 0.62162162 0.56410256 0.5625 ] mean value: 0.6041324844678503 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.23 Accuracy on Blind test: 0.62 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [2.1599102 2.15545416 2.1110487 2.12321568 2.14602542 2.12714887 2.07959056 2.1073513 2.10407662 2.11057091] mean value: 2.122439241409302 key: score_time value: [0.10290813 0.10166049 0.09488583 0.10192132 0.10095882 0.09461808 0.09486747 0.09576178 0.14553905 0.09514618] mean value: 0.10282671451568604 key: test_mcc value: [0.93933644 0.75146915 0.78163175 0.90669283 0.82507166 0.72407013 0.78160117 0.90524194 0.77800241 0.78719616] mean value: 0.8180313634790382 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96875 0.875 0.890625 0.953125 0.9047619 0.85714286 0.88888889 0.95238095 0.88888889 0.88888889] mean value: 0.9068452380952381 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96969697 0.87878788 0.88888889 0.95238095 0.91176471 0.86567164 0.89230769 0.95238095 0.89230769 0.89855072] mean value: 0.9102738099062105 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.94117647 0.85294118 0.90322581 0.96774194 0.83783784 0.80555556 0.85294118 0.96774194 0.87878788 0.83783784] mean value: 0.8845787610967877 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.90625 0.875 0.9375 1. 0.93548387 0.93548387 0.9375 0.90625 0.96875 ] mean value: 0.9402217741935484 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96875 0.875 0.890625 0.953125 0.90625 0.85836694 0.88961694 0.95262097 0.88860887 0.88760081] mean value: 0.9070564516129033 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.94117647 0.78378378 0.8 0.90909091 0.83783784 0.76315789 0.80555556 0.90909091 0.80555556 0.81578947] mean value: 0.8371038389923839 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.67 Accuracy on Blind test: 0.84 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000...05', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [1.00224113 1.03473115 1.0149684 1.03896141 1.01002407 1.00975657 1.02722478 1.11543322 1.08370209 1.00851583] mean value: 1.0345558643341064 key: score_time value: [0.28700686 0.21716261 0.2733736 0.23618817 0.25626564 0.19256282 0.22805929 0.24182963 0.19579506 0.23420334] mean value: 0.23624470233917236 key: test_mcc value: [0.90669283 0.72192954 0.75146915 0.84748251 0.78822824 0.72407013 0.81130213 0.84173387 0.77800241 0.75156646] mean value: 0.7922477281849055 key: train_mcc value: [0.94133067 0.9340293 0.9233756 0.93039747 0.93777673 0.94115006 0.92690126 0.92690929 0.93052245 0.93031595] mean value: 0.932270878516129 key: test_accuracy value: [0.953125 0.859375 0.875 0.921875 0.88888889 0.85714286 0.9047619 0.92063492 0.88888889 0.87301587] mean value: 0.8942708333333333 key: train_accuracy value: [0.97017544 0.96666667 0.96140351 0.96491228 0.96847636 0.97022767 0.96322242 0.96322242 0.96497373 0.96497373] mean value: 0.9658254216978523 key: test_fscore value: [0.95384615 0.86567164 0.87096774 0.91803279 0.89552239 0.86567164 0.90625 0.92063492 0.89230769 0.88235294] mean value: 0.8971257908427758 key: train_fscore value: [0.97084048 0.96729776 0.96206897 0.96551724 0.96917808 0.97084048 0.96385542 0.96373057 0.96551724 0.96539792] mean value: 0.9664244169005379 key: test_precision value: [0.93939394 0.82857143 0.9 0.96551724 0.83333333 0.80555556 0.87878788 0.93548387 0.87878788 0.83333333] mean value: 0.87987644601104 key: train_precision value: [0.94966443 0.94932432 0.94576271 0.94915254 0.94966443 0.95286195 0.94915254 0.94897959 0.94915254 0.95221843] mean value: 0.9495933497100595 key: test_recall value: [0.96875 0.90625 0.84375 0.875 0.96774194 0.93548387 0.93548387 0.90625 0.90625 0.9375 ] mean value: 0.9182459677419355 key: train_recall value: [0.99298246 0.98596491 0.97894737 0.98245614 0.98951049 0.98951049 0.97902098 0.97894737 0.98245614 0.97894737] mean value: 0.9838743712427923 key: test_roc_auc value: [0.953125 0.859375 0.875 0.921875 0.89012097 0.85836694 0.90524194 0.92086694 0.88860887 0.87197581] mean value: 0.8944556451612904 key: train_roc_auc value: [0.97017544 0.96666667 0.96140351 0.96491228 0.96843946 0.97019384 0.9631947 0.96324991 0.96500429 0.96499816] mean value: 0.9658238252975095 key: test_jcc value: [0.91176471 0.76315789 0.77142857 0.84848485 0.81081081 0.76315789 0.82857143 0.85294118 0.80555556 0.78947368] mean value: 0.8145346570888367 key: train_jcc value: [0.94333333 0.93666667 0.9269103 0.93333333 0.94019934 0.94333333 0.93023256 0.93 0.93333333 0.93311037] mean value: 0.9350452560584006 MCC on Blind test: 0.69 Accuracy on Blind test: 0.85 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02679944 0.01057577 0.01054859 0.01065159 0.01063442 0.01070833 0.01059222 0.0106802 0.01059055 0.01068306] mean value: 0.012246417999267577 key: score_time value: [0.00978088 0.0090704 0.01042318 0.00912976 0.00906348 0.00907946 0.00904584 0.00904346 0.00901008 0.0090301 ] mean value: 0.009267663955688477 key: test_mcc value: [0.4113018 0.34391797 0.56694671 0.6011334 0.52679717 0.42986904 0.39656932 0.42986904 0.40025188 0.38166127] mean value: 0.44883175921777024 key: train_mcc value: [0.55727849 0.55087719 0.55689066 0.54886043 0.51705741 0.53466669 0.55324733 0.5320108 0.51729972 0.55982989] mean value: 0.542801862027514 key: test_accuracy value: [0.703125 0.671875 0.78125 0.796875 0.76190476 0.71428571 0.6984127 0.71428571 0.6984127 0.68253968] mean value: 0.722296626984127 key: train_accuracy value: [0.77719298 0.7754386 0.77719298 0.77368421 0.75831874 0.76707531 0.77583187 0.76532399 0.75831874 0.7793345 ] mean value: 0.7707711924294097 key: test_fscore value: [0.6779661 0.66666667 0.79411765 0.8115942 0.76923077 0.71875 0.68852459 0.70967742 0.72463768 0.72972973] mean value: 0.7290894807957649 key: train_fscore value: [0.78797997 0.7754386 0.78726968 0.78172589 0.76369863 0.77264957 0.78451178 0.77288136 0.76369863 0.78571429] mean value: 0.7775568392250982 key: test_precision value: [0.74074074 0.67741935 0.75 0.75675676 0.73529412 0.6969697 0.7 0.73333333 0.67567568 0.64285714] mean value: 0.7109046818819115 key: train_precision value: [0.75159236 0.7754386 0.75320513 0.75490196 0.74832215 0.75585284 0.75649351 0.74754098 0.7458194 0.76237624] mean value: 0.7551543158346077 key: test_recall value: [0.625 0.65625 0.84375 0.875 0.80645161 0.74193548 0.67741935 0.6875 0.78125 0.84375 ] mean value: 0.7538306451612903 key: train_recall value: [0.82807018 0.7754386 0.8245614 0.81052632 0.77972028 0.79020979 0.81468531 0.8 0.78245614 0.81052632] mean value: 0.8016194331983806 key: test_roc_auc value: [0.703125 0.671875 0.78125 0.796875 0.76260081 0.71471774 0.69808468 0.71471774 0.69707661 0.67993952] mean value: 0.7220262096774194 key: train_roc_auc value: [0.77719298 0.7754386 0.77719298 0.77368421 0.75828119 0.76703472 0.77576371 0.76538462 0.75836094 0.77938903] mean value: 0.7707722978775611 key: test_jcc value: [0.51282051 0.5 0.65853659 0.68292683 0.625 0.56097561 0.525 0.55 0.56818182 0.57446809] mean value: 0.5757909440498958 key: train_jcc value: [0.65013774 0.63323782 0.64917127 0.64166667 0.61772853 0.62952646 0.64542936 0.62983425 0.61772853 0.64705882] mean value: 0.63615194674427 MCC on Blind test: 0.36 Accuracy on Blind test: 0.69 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.1093576 0.09363127 0.09794664 0.09333253 0.24359059 0.09687424 0.09973836 0.10169721 0.12564659 0.0984056 ] mean value: 0.11602206230163574 key: score_time value: [0.01119781 0.01130223 0.01117301 0.01125622 0.01114964 0.01119208 0.01138306 0.01121926 0.01116014 0.01131511] mean value: 0.011234855651855469 key: test_mcc value: [0.93933644 0.84416229 0.78470603 0.8125 0.84530217 0.81130213 0.84530217 0.96871896 0.84173387 0.74722285] mean value: 0.8440286901864472 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96875 0.921875 0.890625 0.90625 0.92063492 0.9047619 0.92063492 0.98412698 0.92063492 0.87301587] mean value: 0.9211309523809523 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96969697 0.92063492 0.8852459 0.90625 0.92307692 0.90625 0.92307692 0.98461538 0.92063492 0.87878788] mean value: 0.9218269822163264 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.94117647 0.93548387 0.93103448 0.90625 0.88235294 0.87878788 0.88235294 0.96969697 0.93548387 0.85294118] mean value: 0.9115560602590718 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.90625 0.84375 0.90625 0.96774194 0.93548387 0.96774194 1. 0.90625 0.90625 ] mean value: 0.9339717741935484 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96875 0.921875 0.890625 0.90625 0.92137097 0.90524194 0.92137097 0.98387097 0.92086694 0.87247984] mean value: 0.9212701612903226 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.94117647 0.85294118 0.79411765 0.82857143 0.85714286 0.82857143 0.85714286 0.96969697 0.85294118 0.78378378] mean value: 0.856608579549756 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.74 Accuracy on Blind test: 0.87 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.0467267 0.1044414 0.06506896 0.09787321 0.08672071 0.05564761 0.08345127 0.05584645 0.08797264 0.09393358] mean value: 0.07776825428009033 key: score_time value: [0.01902127 0.02189255 0.01270485 0.02597713 0.03544736 0.01601791 0.01896071 0.01225686 0.02112865 0.01939201] mean value: 0.020279932022094726 key: test_mcc value: [0.53150959 0.40644851 0.62994079 0.75 0.60364273 0.62475802 0.52419355 0.61895161 0.68740835 0.62469891] mean value: 0.6001552066732136 key: train_mcc value: [0.80059654 0.83258409 0.83571043 0.81585011 0.82950084 0.83562019 0.82558492 0.81989894 0.79756669 0.82986082] mean value: 0.8222773553965407 key: test_accuracy value: [0.765625 0.703125 0.8125 0.875 0.79365079 0.80952381 0.76190476 0.80952381 0.84126984 0.79365079] mean value: 0.796577380952381 key: train_accuracy value: [0.9 0.91578947 0.91754386 0.90701754 0.91418564 0.91768827 0.91243433 0.9089317 0.89842382 0.91418564] mean value: 0.9106200264233263 key: test_fscore value: [0.76190476 0.70769231 0.8 0.875 0.8115942 0.81818182 0.76190476 0.8125 0.85294118 0.82666667] mean value: 0.8028385695719455 key: train_fscore value: [0.90189329 0.91780822 0.91910499 0.91001698 0.9165247 0.91882556 0.91438356 0.91186441 0.90034364 0.9165247 ] mean value: 0.9127290052032038 key: test_precision value: [0.77419355 0.6969697 0.85714286 0.875 0.73684211 0.77142857 0.75 0.8125 0.80555556 0.72093023] mean value: 0.7800562567305075 key: train_precision value: [0.88513514 0.89632107 0.90202703 0.88157895 0.89368771 0.90784983 0.89597315 0.88196721 0.88215488 0.89072848] mean value: 0.8917423443210672 key: test_recall value: [0.75 0.71875 0.75 0.875 0.90322581 0.87096774 0.77419355 0.8125 0.90625 0.96875 ] mean value: 0.8329637096774194 key: train_recall value: [0.91929825 0.94035088 0.93684211 0.94035088 0.94055944 0.93006993 0.93356643 0.94385965 0.91929825 0.94385965] mean value: 0.9348055453318611 key: test_roc_auc value: [0.765625 0.703125 0.8125 0.875 0.7953629 0.81048387 0.76209677 0.80947581 0.84022177 0.79082661] mean value: 0.7964717741935484 key: train_roc_auc value: [0.9 0.91578947 0.91754386 0.90701754 0.91413937 0.91766654 0.91239725 0.90899276 0.89846031 0.91423752] mean value: 0.9106244632560422 key: test_jcc value: [0.61538462 0.54761905 0.66666667 0.77777778 0.68292683 0.69230769 0.61538462 0.68421053 0.74358974 0.70454545] mean value: 0.6730412968859696 key: train_jcc value: [0.82131661 0.84810127 0.85031847 0.83489097 0.84591195 0.84984026 0.84227129 0.83800623 0.81875 0.84591195] mean value: 0.8395318996179627 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01548767 0.01054072 0.01008558 0.0100286 0.01017189 0.01033568 0.01030731 0.01033974 0.01020432 0.01021051] mean value: 0.01077120304107666 key: score_time value: [0.01049852 0.00919342 0.00898242 0.00896382 0.00888681 0.00889635 0.00888109 0.00895309 0.00899196 0.0088768 ] mean value: 0.00911242961883545 key: test_mcc value: [0.40644851 0.5336001 0.46897905 0.56360186 0.5485062 0.39656932 0.58770161 0.61982085 0.50663549 0.58371723] mean value: 0.5215580225467669 key: train_mcc value: [0.57766536 0.54579833 0.54401001 0.51829689 0.54017741 0.56212452 0.54056044 0.5519492 0.56071673 0.55034238] mean value: 0.5491641282086239 key: test_accuracy value: [0.703125 0.765625 0.734375 0.78125 0.76190476 0.6984127 0.79365079 0.80952381 0.74603175 0.77777778] mean value: 0.7571676587301587 key: train_accuracy value: [0.7877193 0.77192982 0.77017544 0.75789474 0.76882662 0.7793345 0.76882662 0.77408056 0.7793345 0.77408056] mean value: 0.7732202660767505 key: test_fscore value: [0.6984127 0.7761194 0.73846154 0.78787879 0.78873239 0.68852459 0.79365079 0.81818182 0.77777778 0.81081081] mean value: 0.7678550612689431 key: train_fscore value: [0.79663866 0.78114478 0.7827529 0.76923077 0.78 0.79139073 0.7807309 0.78606965 0.78787879 0.78319328] mean value: 0.7839030450411416 key: test_precision value: [0.70967742 0.74285714 0.72727273 0.76470588 0.7 0.7 0.78125 0.79411765 0.7 0.71428571] mean value: 0.7334166533182188 key: train_precision value: [0.76451613 0.75080906 0.74213836 0.73482428 0.74522293 0.75157233 0.74367089 0.74528302 0.75728155 0.7516129 ] mean value: 0.7486931454999034 key: test_recall value: [0.6875 0.8125 0.75 0.8125 0.90322581 0.67741935 0.80645161 0.84375 0.875 0.9375 ] mean value: 0.8105846774193548 key: train_recall value: [0.83157895 0.81403509 0.82807018 0.80701754 0.81818182 0.83566434 0.82167832 0.83157895 0.82105263 0.81754386] mean value: 0.8226401668506932 key: test_roc_auc value: [0.703125 0.765625 0.734375 0.78125 0.7641129 0.69808468 0.79385081 0.80897177 0.74395161 0.77520161] mean value: 0.7568548387096774 key: train_roc_auc value: [0.7877193 0.77192982 0.77017544 0.75789474 0.76874003 0.77923568 0.7687339 0.77418108 0.77940743 0.77415655] mean value: 0.7732173966384492 key: test_jcc value: [0.53658537 0.63414634 0.58536585 0.65 0.65116279 0.525 0.65789474 0.69230769 0.63636364 0.68181818] mean value: 0.62506445990049 key: train_jcc value: [0.66201117 0.64088398 0.64305177 0.625 0.63934426 0.65479452 0.64032698 0.64754098 0.65 0.64364641] mean value: 0.6446600072968279 MCC on Blind test: 0.42 Accuracy on Blind test: 0.71 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.0133729 0.02345967 0.01946425 0.02447844 0.02136731 0.02070236 0.02127075 0.02006936 0.02137899 0.02137303] mean value: 0.020693707466125488 key: score_time value: [0.01019573 0.01148057 0.01190829 0.01195073 0.01189804 0.01192021 0.01198316 0.01190925 0.01193452 0.01192427] mean value: 0.011710476875305176 key: test_mcc value: [0.50395263 0.48038446 0.43033148 0.6644106 0.52371369 0.42753131 0.33112209 0.49391458 0.125 0.43960456] mean value: 0.44199654030263313 key: train_mcc value: [0.71010764 0.67515502 0.39816653 0.68967448 0.70150161 0.41409554 0.44601782 0.62481351 0.17548103 0.72805889] mean value: 0.5563072070052836 key: test_accuracy value: [0.75 0.734375 0.65625 0.828125 0.76190476 0.65079365 0.63492063 0.73015873 0.50793651 0.68253968] mean value: 0.6937003968253969 key: train_accuracy value: [0.84385965 0.82631579 0.63684211 0.83333333 0.8441331 0.65148862 0.67950963 0.80210158 0.53064799 0.84938704] mean value: 0.7497618828156205 key: test_fscore value: [0.76470588 0.70175439 0.74418605 0.84057971 0.75409836 0.73809524 0.46511628 0.67924528 0.06060606 0.75609756] mean value: 0.650448480739569 key: train_fscore value: [0.86115445 0.80080483 0.73359073 0.85225505 0.827853 0.74054759 0.54590571 0.77263581 0.11258278 0.86769231] mean value: 0.7115022260480378 key: test_precision value: [0.72222222 0.8 0.59259259 0.78378378 0.76666667 0.58490566 0.83333333 0.85714286 1. 0.62 ] mean value: 0.7560647116118814 key: train_precision value: [0.7752809 0.93867925 0.57926829 0.76536313 0.92640693 0.59043659 0.94017094 0.90566038 1. 0.77260274] mean value: 0.8193869139432945 key: test_recall value: [0.8125 0.625 1. 0.90625 0.74193548 1. 0.32258065 0.5625 0.03125 0.96875 ] mean value: 0.6970766129032258 key: train_recall value: [0.96842105 0.69824561 1. 0.96140351 0.74825175 0.99300699 0.38461538 0.67368421 0.05964912 0.98947368] mean value: 0.7476751318856582 key: test_roc_auc value: [0.75 0.734375 0.65625 0.828125 0.76159274 0.65625 0.63004032 0.7328629 0.515625 0.67792339] mean value: 0.694304435483871 key: train_roc_auc value: [0.84385965 0.82631579 0.63684211 0.83333333 0.84430131 0.65088946 0.68002699 0.80187707 0.52982456 0.84963195] mean value: 0.7496902220586431 key: test_jcc value: [0.61904762 0.54054054 0.59259259 0.725 0.60526316 0.58490566 0.3030303 0.51428571 0.03125 0.60784314] mean value: 0.5123758725023767 key: train_jcc value: [0.75616438 0.66778523 0.57926829 0.74254743 0.70627063 0.58799172 0.37542662 0.6295082 0.05964912 0.76630435] mean value: 0.5870915970622187 MCC on Blind test: 0.42 Accuracy on Blind test: 0.72 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.02321935 0.03090262 0.02621317 0.02358508 0.0311234 0.02431655 0.02363515 0.0268414 0.02199745 0.0251801 ] mean value: 0.025701427459716798 key: score_time value: [0.01190567 0.01201439 0.01202464 0.01195478 0.01194811 0.01680708 0.01229119 0.01195621 0.01197958 0.01200938] mean value: 0.012489104270935058 key: test_mcc value: [0.56360186 0.32025631 0.65915306 0.625 0.40305948 0.65821474 0.34495882 0.77800241 0.48255984 0.47783651] mean value: 0.5312643032230694 key: train_mcc value: [0.76440851 0.5666306 0.71428268 0.74152034 0.54722946 0.72499438 0.43617489 0.77291774 0.67680962 0.78291863] mean value: 0.6727886849719236 key: test_accuracy value: [0.78125 0.625 0.828125 0.8125 0.68253968 0.82539683 0.63492063 0.88888889 0.73015873 0.73015873] mean value: 0.7538938492063492 key: train_accuracy value: [0.87719298 0.75263158 0.84385965 0.87017544 0.73204904 0.85639229 0.66024518 0.88616462 0.83012259 0.89141856] mean value: 0.8200251943343473 key: test_fscore value: [0.78787879 0.45454545 0.8358209 0.8125 0.58333333 0.80701754 0.71604938 0.89230769 0.69090909 0.76712329] mean value: 0.7347485468743679 key: train_fscore value: [0.88636364 0.68027211 0.86244204 0.86642599 0.63657957 0.84230769 0.74673629 0.88812392 0.8086785 0.89198606] mean value: 0.810991582332734 key: test_precision value: [0.76470588 0.83333333 0.8 0.8125 0.82352941 0.88461538 0.58 0.87878788 0.82608696 0.68292683] mean value: 0.7886485676644276 key: train_precision value: [0.82477341 0.96153846 0.77071823 0.89219331 0.99259259 0.93589744 0.59583333 0.87162162 0.92342342 0.88581315] mean value: 0.8654404971687462 key: test_recall value: [0.8125 0.3125 0.875 0.8125 0.4516129 0.74193548 0.93548387 0.90625 0.59375 0.875 ] mean value: 0.7316532258064516 key: train_recall value: [0.95789474 0.52631579 0.97894737 0.84210526 0.46853147 0.76573427 1. 0.90526316 0.71929825 0.89824561] mean value: 0.8062335909704331 key: test_roc_auc value: [0.78125 0.625 0.828125 0.8125 0.67893145 0.82409274 0.63961694 0.88860887 0.73235887 0.72782258] mean value: 0.7538306451612904 key: train_roc_auc value: [0.87719298 0.75263158 0.84385965 0.87017544 0.73251135 0.85655134 0.65964912 0.88619801 0.82992884 0.8914305 ] mean value: 0.8200128818549871 key: test_jcc value: [0.65 0.29411765 0.71794872 0.68421053 0.41176471 0.67647059 0.55769231 0.80555556 0.52777778 0.62222222] mean value: 0.5947760048688842 key: train_jcc value: [0.79591837 0.51546392 0.75815217 0.76433121 0.46689895 0.72757475 0.59583333 0.79876161 0.67880795 0.80503145] mean value: 0.6906773711312438 MCC on Blind test: 0.48 Accuracy on Blind test: 0.74 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.21017671 0.19325519 0.19400144 0.19451475 0.19573307 0.1954298 0.19602394 0.19409275 0.19402218 0.19568753] mean value: 0.19629373550415039 key: score_time value: [0.01554799 0.01554918 0.01552463 0.01568484 0.01562786 0.01594067 0.01572442 0.01567841 0.01569891 0.0156548 ] mean value: 0.01566317081451416 key: test_mcc value: [0.91025899 0.8125 0.75 0.78163175 0.74772995 0.84530217 0.82507166 0.87462485 0.79701677 0.81572458] mean value: 0.8159860715975866 key: train_mcc value: [0.91599249 0.95453287 0.95453287 0.9579891 0.9439578 0.95154401 0.94404909 0.94416837 0.95806341 0.95105762] mean value: 0.9475887624403511 key: test_accuracy value: [0.953125 0.90625 0.875 0.890625 0.87301587 0.92063492 0.9047619 0.93650794 0.88888889 0.9047619 ] mean value: 0.9053571428571429 key: train_accuracy value: [0.95789474 0.97719298 0.97719298 0.97894737 0.97197898 0.97548161 0.97197898 0.97197898 0.97898424 0.97548161] mean value: 0.9737112483485422 key: test_fscore value: [0.95522388 0.90625 0.875 0.89230769 0.875 0.92307692 0.91176471 0.93939394 0.90140845 0.91176471] mean value: 0.9091190297844501 key: train_fscore value: [0.95833333 0.9773913 0.9773913 0.97909408 0.97202797 0.97594502 0.97222222 0.97222222 0.97909408 0.97560976] mean value: 0.9739331285091197 key: test_precision value: [0.91428571 0.90625 0.875 0.87878788 0.84848485 0.88235294 0.83783784 0.91176471 0.82051282 0.86111111] mean value: 0.8736387858079034 key: train_precision value: [0.94845361 0.96896552 0.96896552 0.97231834 0.97202797 0.95945946 0.96551724 0.96219931 0.97231834 0.96885813] mean value: 0.9659083438000281 key: test_recall value: [1. 0.90625 0.875 0.90625 0.90322581 0.96774194 1. 0.96875 1. 0.96875 ] mean value: 0.9495967741935484 key: train_recall value: [0.96842105 0.98596491 0.98596491 0.98596491 0.97202797 0.99300699 0.97902098 0.98245614 0.98596491 0.98245614] mean value: 0.9821248926512084 key: test_roc_auc value: [0.953125 0.90625 0.875 0.890625 0.8734879 0.92137097 0.90625 0.9359879 0.88709677 0.90372984] mean value: 0.9052923387096774 key: train_roc_auc value: [0.95789474 0.97719298 0.97719298 0.97894737 0.9719789 0.97545086 0.97196663 0.9719973 0.97899644 0.9754938 ] mean value: 0.9737112010796222 key: test_jcc value: [0.91428571 0.82857143 0.77777778 0.80555556 0.77777778 0.85714286 0.83783784 0.88571429 0.82051282 0.83783784] mean value: 0.8343013893013893 key: train_jcc value: [0.92 0.95578231 0.95578231 0.95904437 0.94557823 0.95302013 0.94594595 0.94594595 0.95904437 0.95238095] mean value: 0.9492524572845255 MCC on Blind test: 0.67 Accuracy on Blind test: 0.84 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.09946442 0.09689593 0.09891295 0.12208009 0.12569952 0.11471534 0.09828115 0.1115005 0.09501028 0.12196279] mean value: 0.10845229625701905 key: score_time value: [0.02143168 0.035707 0.0302124 0.03004122 0.03268075 0.02089787 0.04332209 0.03567576 0.04053068 0.04182529] mean value: 0.033232474327087404 key: test_mcc value: [0.87671401 0.84416229 0.78163175 0.78163175 0.78160117 0.78822824 0.78160117 0.78160117 0.87462485 0.68352185] mean value: 0.79753182497422 key: train_mcc value: [0.99649736 0.97897147 0.98246219 0.98598919 0.99299472 0.98254074 0.98601347 0.98949822 0.98601347 0.98598945] mean value: 0.9866970290331141 key: test_accuracy value: [0.9375 0.921875 0.890625 0.890625 0.88888889 0.88888889 0.88888889 0.88888889 0.93650794 0.84126984] mean value: 0.8973958333333333 key: train_accuracy value: [0.99824561 0.98947368 0.99122807 0.99298246 0.99649737 0.99124343 0.99299475 0.99474606 0.99299475 0.99299475] mean value: 0.9933400927888899 key: test_fscore value: [0.93939394 0.92307692 0.89230769 0.88888889 0.89230769 0.89552239 0.89230769 0.8852459 0.93939394 0.84848485] mean value: 0.8996929905860662 key: train_fscore value: [0.99824869 0.98951049 0.99124343 0.99295775 0.9965035 0.99130435 0.99303136 0.99474606 0.99295775 0.99298246] mean value: 0.9933485820457163 key: test_precision value: [0.91176471 0.90909091 0.87878788 0.90322581 0.85294118 0.83333333 0.85294118 0.93103448 0.91176471 0.82352941] mean value: 0.8808413586892943 key: train_precision value: [0.9965035 0.98606272 0.98951049 0.99646643 0.9965035 0.98615917 0.98958333 0.99300699 0.99646643 0.99298246] mean value: 0.992324501450918 key: test_recall value: [0.96875 0.9375 0.90625 0.875 0.93548387 0.96774194 0.93548387 0.84375 0.96875 0.875 ] mean value: 0.9213709677419355 key: train_recall value: [1. 0.99298246 0.99298246 0.98947368 0.9965035 0.9965035 0.9965035 0.99649123 0.98947368 0.99298246] mean value: 0.994389645442277 key: test_roc_auc value: [0.9375 0.921875 0.890625 0.890625 0.88961694 0.89012097 0.88961694 0.88961694 0.9359879 0.84072581] mean value: 0.8976310483870967 key: train_roc_auc value: [0.99824561 0.98947368 0.99122807 0.99298246 0.99649736 0.9912342 0.99298859 0.99474911 0.99298859 0.99299472] mean value: 0.9933382407066618 key: test_jcc value: [0.88571429 0.85714286 0.80555556 0.8 0.80555556 0.81081081 0.80555556 0.79411765 0.88571429 0.73684211] mean value: 0.8187008658370888 key: train_jcc value: [0.9965035 0.97923875 0.98263889 0.98601399 0.99303136 0.98275862 0.98615917 0.98954704 0.98601399 0.98606272] mean value: 0.9867968016968023 MCC on Blind test: 0.67 Accuracy on Blind test: 0.84 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.21038747 0.17521906 0.21723461 0.22550559 0.21980762 0.23454189 0.23472786 0.23469305 0.25624919 0.23777986] mean value: 0.22461462020874023 key: score_time value: [0.02699494 0.01949954 0.02698517 0.02703357 0.02720189 0.0272665 0.0273447 0.02730489 0.03539729 0.02732086] mean value: 0.027234935760498048 key: test_mcc value: [0.6875 0.38729833 0.50395263 0.63628476 0.64134943 0.47384924 0.61895161 0.52371369 0.42871785 0.43470518] mean value: 0.5336322729777344 key: train_mcc value: [0.96512618 0.95848494 0.96169363 0.95827234 0.965351 0.965351 0.97235938 0.95834669 0.96161964 0.96862577] mean value: 0.963523056967868 key: test_accuracy value: [0.84375 0.6875 0.75 0.8125 0.80952381 0.73015873 0.80952381 0.76190476 0.71428571 0.71428571] mean value: 0.763343253968254 key: train_accuracy value: [0.98245614 0.97894737 0.98070175 0.97894737 0.98248687 0.98248687 0.98598949 0.97898424 0.98073555 0.98423818] mean value: 0.9815973822472117 key: test_fscore value: [0.84375 0.72222222 0.76470588 0.82857143 0.82857143 0.75362319 0.80645161 0.76923077 0.72727273 0.74285714] mean value: 0.7787256402387683 key: train_fscore value: [0.98263889 0.97931034 0.98093588 0.97923875 0.98275862 0.98275862 0.9862069 0.97923875 0.98086957 0.98434783] mean value: 0.9818304146819015 key: test_precision value: [0.84375 0.65 0.72222222 0.76315789 0.74358974 0.68421053 0.80645161 0.75757576 0.70588235 0.68421053] mean value: 0.7361050636600547 key: train_precision value: [0.97250859 0.96271186 0.96917808 0.96587031 0.96938776 0.96938776 0.97278912 0.96587031 0.97241379 0.97586207] mean value: 0.9695979639917629 key: test_recall value: [0.84375 0.8125 0.8125 0.90625 0.93548387 0.83870968 0.80645161 0.78125 0.75 0.8125 ] mean value: 0.8299395161290323 key: train_recall value: [0.99298246 0.99649123 0.99298246 0.99298246 0.9965035 0.9965035 1. 0.99298246 0.98947368 0.99298246] mean value: 0.9943884185989449 key: test_roc_auc value: [0.84375 0.6875 0.75 0.8125 0.81149194 0.73185484 0.80947581 0.76159274 0.71370968 0.71270161] mean value: 0.7634576612903226 key: train_roc_auc value: [0.98245614 0.97894737 0.98070175 0.97894737 0.98246227 0.98246227 0.98596491 0.97900871 0.98075083 0.98425347] mean value: 0.9815955097534045 key: test_jcc value: [0.72972973 0.56521739 0.61904762 0.70731707 0.70731707 0.60465116 0.67567568 0.625 0.57142857 0.59090909] mean value: 0.6396293387227195 key: train_jcc value: [0.96587031 0.95945946 0.96258503 0.95932203 0.96610169 0.96610169 0.97278912 0.95932203 0.96245734 0.96917808] mean value: 0.9643186793989418 MCC on Blind test: 0.35 Accuracy on Blind test: 0.68 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.85160112 0.84095502 0.84922051 0.83509898 0.83196425 0.82911921 0.82614064 0.82878995 0.82055306 0.81654668] mean value: 0.8329989433288574 key: score_time value: [0.01016712 0.01039219 0.01057267 0.01019001 0.00998306 0.00996137 0.00951242 0.00953674 0.00991821 0.01015687] mean value: 0.010039067268371582 key: test_mcc value: [0.93933644 0.875 0.81409158 0.875 0.85238636 0.84530217 0.84530217 0.93832585 0.84484323 0.68245968] mean value: 0.8512047470887891 key: train_mcc value: [0.99300691 0.99298246 0.99649736 0.99649736 1. 0.98949822 0.98949822 0.99301901 0.99299472 0.98598945] mean value: 0.9929983708938576 key: test_accuracy value: [0.96875 0.9375 0.90625 0.9375 0.92063492 0.92063492 0.92063492 0.96825397 0.92063492 0.84126984] mean value: 0.9242063492063491 key: train_accuracy value: [0.99649123 0.99649123 0.99824561 0.99824561 1. 0.99474606 0.99474606 0.99649737 0.99649737 0.99299475] mean value: 0.9964955295418932 key: test_fscore value: [0.96969697 0.9375 0.90322581 0.9375 0.92537313 0.92307692 0.92307692 0.96969697 0.92537313 0.84375 ] mean value: 0.9258269860656114 key: train_fscore value: [0.99647887 0.99649123 0.99824253 0.99824869 1. 0.99474606 0.99474606 0.99647887 0.99649123 0.99298246] mean value: 0.996490599511949 key: test_precision value: [0.94117647 0.9375 0.93333333 0.9375 0.86111111 0.88235294 0.88235294 0.94117647 0.88571429 0.84375 ] mean value: 0.9045967553688142 key: train_precision value: [1. 0.99649123 1. 0.9965035 1. 0.99649123 0.99649123 1. 0.99649123 0.99298246] mean value: 0.9975450864924549 key: test_recall value: [1. 0.9375 0.875 0.9375 1. 0.96774194 0.96774194 1. 0.96875 0.84375 ] mean value: 0.9497983870967742 key: train_recall value: [0.99298246 0.99649123 0.99649123 1. 1. 0.99300699 0.99300699 0.99298246 0.99649123 0.99298246] mean value: 0.9954435038645565 key: test_roc_auc value: [0.96875 0.9375 0.90625 0.9375 0.921875 0.92137097 0.92137097 0.96774194 0.91985887 0.84122984] mean value: 0.9243447580645161 key: train_roc_auc value: [0.99649123 0.99649123 0.99824561 0.99824561 1. 0.99474911 0.99474911 0.99649123 0.99649736 0.99299472] mean value: 0.9964955220218378 key: test_jcc value: [0.94117647 0.88235294 0.82352941 0.88235294 0.86111111 0.85714286 0.85714286 0.94117647 0.86111111 0.72972973] mean value: 0.8636825901531784 key: train_jcc value: [0.99298246 0.99300699 0.99649123 0.9965035 1. 0.98954704 0.98954704 0.99298246 0.99300699 0.98606272] mean value: 0.9930130417293447 MCC on Blind test: 0.7 Accuracy on Blind test: 0.86 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.03198886 0.03319812 0.03335285 0.03351092 0.03357315 0.03336334 0.03523016 0.03520679 0.03442335 0.06159759] mean value: 0.036544513702392575 key: score_time value: [0.01272321 0.01282072 0.01404047 0.02016497 0.01373696 0.01378155 0.0173955 0.01282907 0.01376104 0.01932311] mean value: 0.015057659149169922 key: test_mcc value: [0.21442251 0.18442778 0.43033148 0.35043832 0.22008521 0.12607181 0.21117195 0.36114822 0.3592106 0.18084933] mean value: 0.2638157215951625 key: train_mcc value: [0.3365728 0.35876576 0.31683766 0.32679675 0.34005692 0.34644988 0.33683398 0.33242623 0.32917725 0.34206181] mean value: 0.33659790354910774 key: test_accuracy value: [0.5625 0.578125 0.65625 0.609375 0.53968254 0.52380952 0.55555556 0.65079365 0.61904762 0.55555556] mean value: 0.5850694444444444 key: train_accuracy value: [0.60175439 0.61403509 0.59122807 0.59649123 0.60420315 0.60770578 0.60245184 0.59894921 0.5971979 0.60420315] mean value: 0.6018219805204781 key: test_fscore value: [0.68888889 0.66666667 0.74418605 0.71910112 0.68131868 0.66666667 0.68181818 0.73170732 0.72727273 0.68888889] mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") 0.6996515188701006 key: train_fscore value: [0.71518193 0.72151899 0.70983811 0.7125 0.71679198 0.71859296 0.71589487 0.71339174 0.7125 0.7160804 ] mean value: 0.7152290981730446 key: test_precision value: [0.53448276 0.55102041 0.59259259 0.56140351 0.51666667 0.50847458 0.52631579 0.6 0.57142857 0.53448276] mean value: 0.5496867630609276 key: train_precision value: [0.55664062 0.56435644 0.55019305 0.55339806 0.55859375 0.56078431 0.55750487 0.55447471 0.55339806 0.55772994] mean value: 0.5567073813824097 key: test_recall value: [0.96875 0.84375 1. 1. 1. 0.96774194 0.96774194 0.9375 1. 0.96875 ] mean value: 0.9654233870967742 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.5625 0.578125 0.65625 0.609375 0.546875 0.53074597 0.56199597 0.64616935 0.61290323 0.54889113] mean value: 0.5853830645161291 key: train_roc_auc value: [0.60175439 0.61403509 0.59122807 0.59649123 0.60350877 0.60701754 0.60175439 0.59965035 0.5979021 0.6048951 ] mean value: 0.6018237026131763 key: test_jcc value: [0.52542373 0.5 0.59259259 0.56140351 0.51666667 0.5 0.51724138 0.57692308 0.57142857 0.52542373] mean value: 0.5387103253320301 key: train_jcc value: [0.55664062 0.56435644 0.55019305 0.55339806 0.55859375 0.56078431 0.55750487 0.55447471 0.55339806 0.55772994] mean value: 0.5567073813824097 MCC on Blind test: 0.04 Accuracy on Blind test: 0.46 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.03761649 0.04023623 0.04000401 0.04022217 0.04029584 0.04453206 0.04298353 0.04024267 0.04377055 0.03598547] mean value: 0.040588903427124026 key: score_time value: [0.02147245 0.01967001 0.01933289 0.01929116 0.01915932 0.01928139 0.0192802 0.01923466 0.01931143 0.01930022] mean value: 0.019533371925354003 key: test_mcc value: [0.56360186 0.56360186 0.56360186 0.72192954 0.73343622 0.71790017 0.56086231 0.56710881 0.66625621 0.6385282 ] mean value: 0.629682705082644 key: train_mcc value: [0.78678385 0.81180663 0.77666348 0.7840214 0.78933933 0.78351298 0.76948901 0.76372565 0.76304297 0.80585562] mean value: 0.7834240912338475 key: test_accuracy value: [0.78125 0.78125 0.78125 0.859375 0.85714286 0.85714286 0.77777778 0.77777778 0.82539683 0.80952381] mean value: 0.8107886904761905 key: train_accuracy value: [0.89298246 0.90526316 0.8877193 0.89122807 0.89316988 0.89141856 0.88441331 0.88091068 0.88091068 0.90192644] mean value: 0.8909942544627769 key: test_fscore value: [0.77419355 0.78787879 0.77419355 0.86567164 0.86956522 0.86153846 0.78787879 0.75862069 0.84507042 0.83333333] mean value: 0.8157944438776297 key: train_fscore value: [0.89536878 0.90784983 0.89078498 0.89455782 0.89782245 0.89383562 0.8869863 0.88474576 0.88395904 0.90508475] mean value: 0.8940995333789712 key: test_precision value: [0.8 0.76470588 0.8 0.82857143 0.78947368 0.82352941 0.74285714 0.84615385 0.76923077 0.75 ] mean value: 0.791452216514136 key: train_precision value: [0.87583893 0.88372093 0.86710963 0.8679868 0.86173633 0.87583893 0.86912752 0.8557377 0.86046512 0.87540984] mean value: 0.8692971724259259 key: test_recall value: [0.75 0.8125 0.75 0.90625 0.96774194 0.90322581 0.83870968 0.6875 0.9375 0.9375 ] mean value: 0.8490927419354839 key: train_recall value: [0.91578947 0.93333333 0.91578947 0.92280702 0.93706294 0.91258741 0.90559441 0.91578947 0.90877193 0.93684211] mean value: 0.9204367562262299 key: test_roc_auc value: [0.78125 0.78125 0.78125 0.859375 0.85887097 0.8578629 0.77872984 0.77923387 0.82358871 0.80745968] mean value: 0.8108870967741936 key: train_roc_auc value: [0.89298246 0.90526316 0.8877193 0.89122807 0.89309287 0.89138143 0.88437615 0.88097166 0.88095939 0.90198749] mean value: 0.8909961967856705 key: test_jcc value: [0.63157895 0.65 0.63157895 0.76315789 0.76923077 0.75675676 0.65 0.61111111 0.73170732 0.71428571] mean value: 0.6909407457931207 key: train_jcc value: [0.81055901 0.83125 0.80307692 0.80923077 0.81458967 0.80804954 0.79692308 0.79331307 0.79204893 0.82662539] mean value: 0.8085666363268487 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_7030.py:196: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_cd_7030.py:199: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=168)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.2927134 0.30341053 0.30166936 0.41127467 0.32519579 0.31081867 0.3082726 0.35637784 0.29826283 0.31384945] mean value: 0.32218451499938966 key: score_time value: [0.01918674 0.01912594 0.01932597 0.0190556 0.01911068 0.01906705 0.01924586 0.0189929 0.0189693 0.01898503] mean value: 0.019106507301330566 key: test_mcc value: [0.59404013 0.56360186 0.56360186 0.72192954 0.73343622 0.63159952 0.56086231 0.62475802 0.71705182 0.66625621] mean value: 0.6377137491806849 key: train_mcc value: [0.80761932 0.81180663 0.77666348 0.7840214 0.78933933 0.82176407 0.80817796 0.79970671 0.79081099 0.82735112] mean value: 0.8017261012819527 key: test_accuracy value: [0.796875 0.78125 0.78125 0.859375 0.85714286 0.80952381 0.77777778 0.80952381 0.85714286 0.82539683] mean value: 0.8155257936507936 key: train_accuracy value: [0.90350877 0.90526316 0.8877193 0.89122807 0.89316988 0.91068301 0.90367776 0.89842382 0.89492119 0.91243433] mean value: 0.900102928073248 key: test_fscore value: [0.79365079 0.78787879 0.77419355 0.86567164 0.86956522 0.82352941 0.78787879 0.8 0.86567164 0.84507042] mean value: 0.8213110253068777 key: train_fscore value: [0.90533563 0.90784983 0.89078498 0.89455782 0.89782245 0.91222031 0.90598291 0.9023569 0.89726027 0.91554054] mean value: 0.9029711641867898 key: test_precision value: [0.80645161 0.76470588 0.8 0.82857143 0.78947368 0.75675676 0.74285714 0.85714286 0.82857143 0.76923077] mean value: 0.7943761562597077 key: train_precision value: [0.88851351 0.88372093 0.86710963 0.8679868 0.86173633 0.89830508 0.88628763 0.86731392 0.87625418 0.88273616] mean value: 0.8779964174357806 key: test_recall value: [0.78125 0.8125 0.75 0.90625 0.96774194 0.90322581 0.83870968 0.75 0.90625 0.9375 ] mean value: 0.8553427419354839 key: train_recall value: [0.92280702 0.93333333 0.91578947 0.92280702 0.93706294 0.92657343 0.92657343 0.94035088 0.91929825 0.95087719] mean value: 0.9295472948104527 key: test_roc_auc value: [0.796875 0.78125 0.78125 0.859375 0.85887097 0.8109879 0.77872984 0.81048387 0.85635081 0.82358871] mean value: 0.8157762096774194 key: train_roc_auc value: [0.90350877 0.90526316 0.8877193 0.89122807 0.89309287 0.91065513 0.90363759 0.89849712 0.89496381 0.91250153] mean value: 0.9001067353698933 key: test_jcc value: [0.65789474 0.65 0.63157895 0.76315789 0.76923077 0.7 0.65 0.66666667 0.76315789 0.73170732] mean value: 0.6983394226654818 key: train_jcc value: [0.82704403 0.83125 0.80307692 0.80923077 0.81458967 0.83860759 0.828125 0.82208589 0.8136646 0.84423676] mean value: 0.8231911224023584 MCC on Blind test: 0.45 Accuracy on Blind test: 0.73