/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_orig.py:550: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 424 PASS: my_features_df and aa_df successfully combined nrows: 424 ncols: 265 count of NULL values before imputation or_mychisq 102 log10_or_mychisq 102 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML Total no. of features for aaindex: 123 No. of numerical features: 166 No. of categorical features: 7 index: 0 ind: 1 Mask count check: True Original Data Counter({1: 114, 0: 71}) Data dim: (185, 173) ------------------------------------------------------------- Successfully split data: ORIGINAL training actual values: training set imputed values: blind test set Train data size: (185, 173) Test data size: (239, 173) y_train numbers: Counter({1: 114, 0: 71}) y_train ratio: 0.6228070175438597 y_test_numbers: Counter({0: 120, 1: 119}) y_test ratio: 1.0084033613445378 ------------------------------------------------------------- Simple Random OverSampling Counter({0: 114, 1: 114}) (228, 173) Simple Random UnderSampling Counter({0: 71, 1: 71}) (142, 173) Simple Combined Over and UnderSampling Counter({0: 114, 1: 114}) (228, 173) SMOTE_NC OverSampling Counter({0: 114, 1: 114}) (228, 173) ##################################################################### Running ML analysis: ORIGINAL Gene name: pncA Drug name: pyrazinamide Output directory: /home/tanu/git/Data/pyrazinamide/output/ml/tts_orig/ Sanity checks: Total input features: 173 Training data size: (185, 173) Test data size: (239, 173) Target feature numbers (training data): Counter({1: 114, 0: 71}) Target features ratio (training data: 0.6228070175438597 Target feature numbers (test data): Counter({0: 120, 1: 119}) Target features ratio (test data): 1.0084033613445378 ##################################################################### ================================================================ Strucutral features (n): 34 These are: Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts'] FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss'] Other struc columns: ['rsa', 'kd_values', 'rd_values'] ================================================================ AAindex features (n): 123 These are: ['ALTS910101', 'AZAE970101', 'AZAE970102', 'BASU010101', 'BENS940101', 'BENS940102', 'BENS940103', 'BENS940104', 'BETM990101', 'BLAJ010101', 'BONM030101', 'BONM030102', 'BONM030103', 'BONM030104', 'BONM030105', 'BONM030106', 'BRYS930101', 'CROG050101', 'CSEM940101', 'DAYM780301', 'DAYM780302', 'DOSZ010101', 'DOSZ010102', 'DOSZ010103', 'DOSZ010104', 'FEND850101', 'FITW660101', 'GEOD900101', 'GIAG010101', 'GONG920101', 'GRAR740104', 'HENS920101', 'HENS920102', 'HENS920103', 'HENS920104', 'JOHM930101', 'JOND920103', 'JOND940101', 'KANM000101', 'KAPO950101', 'KESO980101', 'KESO980102', 'KOLA920101', 'KOLA930101', 'KOSJ950100_RSA_SST', 'KOSJ950100_SST', 'KOSJ950110_RSA', 'KOSJ950115', 'LEVJ860101', 'LINK010101', 'LIWA970101', 'LUTR910101', 'LUTR910102', 'LUTR910103', 'LUTR910104', 'LUTR910105', 'LUTR910106', 'LUTR910107', 'LUTR910108', 'LUTR910109', 'MCLA710101', 'MCLA720101', 'MEHP950102', 'MICC010101', 'MIRL960101', 'MIYS850102', 'MIYS850103', 'MIYS930101', 'MIYS960101', 'MIYS960102', 'MIYS960103', 'MIYS990106', 'MIYS990107', 'MIYT790101', 'MOHR870101', 'MOOG990101', 'MUET010101', 'MUET020101', 'MUET020102', 'NAOD960101', 'NGPC000101', 'NIEK910101', 'NIEK910102', 'OGAK980101', 'OVEJ920100_RSA', 'OVEJ920101', 'OVEJ920102', 'OVEJ920103', 'PRLA000101', 'PRLA000102', 'QUIB020101', 'QU_C930101', 'QU_C930102', 'QU_C930103', 'RIER950101', 'RISJ880101', 'RUSR970101', 'RUSR970102', 'RUSR970103', 'SIMK990101', 'SIMK990102', 'SIMK990103', 'SIMK990104', 'SIMK990105', 'SKOJ000101', 'SKOJ000102', 'SKOJ970101', 'TANS760101', 'TANS760102', 'THOP960101', 'TOBD000101', 'TOBD000102', 'TUDE900101', 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'] ================================================================ Evolutionary features (n): 3 These are: ['consurf_score', 'snap2_score', 'provean_score'] ================================================================ Genomic features (n): 6 These are: ['maf', 'logorI'] ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'] ================================================================ Categorical features (n): 7 These are: ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'] ================================================================ Pass: No. of features match ##################################################################### Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03524065 0.031986 0.03195024 0.03289199 0.02481055 0.05734158 0.06366634 0.04798388 0.02917767 0.03274918] mean value: 0.038779807090759275 key: score_time value: [0.01227498 0.01202679 0.01314306 0.01324391 0.0123229 0.01349568 0.01344991 0.01196861 0.01190662 0.01218939] mean value: 0.01260218620300293 key: test_mcc value: [0.33796318 0.54761905 0.0952381 0.77380952 0.65477023 0.53246753 0.89188259 0.12182898 0.2548236 0.2987013 ] mean value: 0.45091040737717836 key: train_mcc value: [0.83287487 0.78705463 0.79925792 0.81149011 0.76271746 0.81037732 0.8120727 0.82431059 0.82431059 0.84779256] mean value: 0.8112258763592134 key: test_accuracy value: [0.68421053 0.78947368 0.57894737 0.89473684 0.84210526 0.77777778 0.94444444 0.61111111 0.66666667 0.66666667] mean value: 0.7456140350877193 key: train_accuracy value: [0.92168675 0.89759036 0.90361446 0.90963855 0.88554217 0.91017964 0.91017964 0.91616766 0.91616766 0.92814371] mean value: 0.909891061250992 key: test_fscore value: [0.75 0.83333333 0.66666667 0.91666667 0.88 0.81818182 0.95238095 0.72 0.76923077 0.72727273] mean value: 0.8033732933732933 key: train_fscore value: [0.93838863 0.92165899 0.92592593 0.93023256 0.91324201 0.93023256 0.93087558 0.93518519 0.93518519 0.94339623] mean value: 0.9304322835927279 key: test_precision value: [0.69230769 0.83333333 0.66666667 0.91666667 0.84615385 0.81818182 1. 0.64285714 0.66666667 0.72727273] mean value: 0.781010656010656 key: train_precision value: [0.91666667 0.86956522 0.87719298 0.88495575 0.85470085 0.89285714 0.88596491 0.89380531 0.89380531 0.91743119] mean value: 0.8886945340694777 key: test_recall value: [0.81818182 0.83333333 0.66666667 0.91666667 0.91666667 0.81818182 0.90909091 0.81818182 0.90909091 0.72727273] mean value: 0.8333333333333334 key: train_recall value: [0.96116505 0.98039216 0.98039216 0.98039216 0.98039216 0.97087379 0.98058252 0.98058252 0.98058252 0.97087379] mean value: 0.9766228821625738 key: test_roc_auc value: [0.65909091 0.77380952 0.54761905 0.88690476 0.81547619 0.76623377 0.95454545 0.55194805 0.5974026 0.64935065] mean value: 0.7202380952380952 key: train_roc_auc value: [0.90915395 0.87300858 0.88082108 0.88863358 0.85738358 0.89168689 0.88872876 0.89654126 0.89654126 0.91512439] mean value: 0.8897623339384297 key: test_jcc value: [0.6 0.71428571 0.5 0.84615385 0.78571429 0.69230769 0.90909091 0.5625 0.625 0.57142857] mean value: 0.6806481018981019 key: train_jcc value: [0.88392857 0.85470085 0.86206897 0.86956522 0.84033613 0.86956522 0.87068966 0.87826087 0.87826087 0.89285714] mean value: 0.870023349804305 MCC on Blind test: 0.42 Accuracy on Blind test: 0.7 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.93467402 1.02162886 0.67857623 0.93024802 0.72668695 0.85950851 0.93305063 0.77515554 0.76321864 1.07244253] mean value: 0.8695189952850342 key: score_time value: [0.01315522 0.01336765 0.01328516 0.01339579 0.0135901 0.01321673 0.01317739 0.01686883 0.01603985 0.01210642] mean value: 0.013820314407348632 key: test_mcc value: [0.60553007 0.45361105 0.67460105 0.80507649 0.77380952 0.66254135 0.56407607 0.64465837 0.44320263 0.2987013 ] mean value: 0.5925807909458823 key: train_mcc value: [1. 1. 1. 1. 1. 0.98737524 1. 0.98737524 1. 0.91120799] mean value: 0.9885958461530414 key: test_accuracy value: [0.78947368 0.73684211 0.84210526 0.89473684 0.89473684 0.83333333 0.72222222 0.83333333 0.72222222 0.66666667] mean value: 0.7935672514619883 key: train_accuracy value: [1. 1. 1. 1. 1. 0.99401198 1. 0.99401198 1. 0.95808383] mean value: 0.9946107784431137 key: test_fscore value: [0.84615385 0.7826087 0.86956522 0.90909091 0.91666667 0.85714286 0.70588235 0.86956522 0.81481481 0.72727273] mean value: 0.829876330451778 key: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( train_fscore value: [1. 1. 1. 1. 1. 0.99516908 1. 0.99516908 1. 0.96650718] mean value: 0.99568453412847 key: test_precision value: [0.73333333 0.81818182 0.90909091 1. 0.91666667 0.9 1. 0.83333333 0.6875 0.72727273] mean value: 0.8525378787878788 key: train_precision value: [1. 1. 1. 1. 1. 0.99038462 1. 0.99038462 1. 0.95283019] mean value: 0.9933599419448476 key: test_recall value: [1. 0.75 0.83333333 0.83333333 0.91666667 0.81818182 0.54545455 0.90909091 1. 0.72727273] mean value: 0.8333333333333334 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.98058252] mean value: 0.9980582524271845 key: test_roc_auc value: [0.75 0.73214286 0.8452381 0.91666667 0.88690476 0.83766234 0.77272727 0.81168831 0.64285714 0.64935065] mean value: 0.7845238095238095 key: train_roc_auc value: [1. 1. 1. 1. 1. 0.9921875 1. 0.9921875 1. 0.95122876] mean value: 0.9935603762135923 key: test_jcc value: [0.73333333 0.64285714 0.76923077 0.83333333 0.84615385 0.75 0.54545455 0.76923077 0.6875 0.57142857] mean value: 0.7148522311022311 key: train_jcc value: [1. 1. 1. 1. 1. 0.99038462 1. 0.99038462 1. 0.93518519] mean value: 0.9915954415954416 MCC on Blind test: 0.23 Accuracy on Blind test: 0.61 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01257062 0.01140189 0.00913525 0.00892401 0.00878763 0.00891304 0.00894451 0.00909281 0.00889635 0.00975704] mean value: 0.009642314910888673 key: score_time value: [0.01189208 0.00911927 0.00909996 0.00931287 0.00870299 0.0086081 0.00883341 0.00891304 0.00927591 0.00955582] mean value: 0.009331345558166504 key: test_mcc value: [0.34405118 0.26772484 0.03912304 0.40849122 0.14085904 0.26856633 0.2987013 0.06493506 0.56061191 0.40291148] mean value: 0.2795975400517249 key: train_mcc value: [0.57098929 0.35088235 0.40877514 0.55947749 0.40877514 0.55309666 0.46678391 0.53583369 0.49453247 0.45408591] mean value: 0.4803232036222982 key: test_accuracy value: [0.68421053 0.68421053 0.57894737 0.73684211 0.63157895 0.66666667 0.66666667 0.55555556 0.77777778 0.72222222] mean value: 0.67046783625731 key: train_accuracy value: [0.80120482 0.70481928 0.72891566 0.79518072 0.72891566 0.79041916 0.75449102 0.78443114 0.76646707 0.74850299] mean value: 0.7603347521823822 key: test_fscore value: [0.76923077 0.78571429 0.69230769 0.81481481 0.74074074 0.75 0.72727273 0.63636364 0.84615385 0.7826087 ] mean value: 0.7545207208250686 key: train_fscore value: [0.84507042 0.79324895 0.79638009 0.84545455 0.79638009 0.83253589 0.81278539 0.8317757 0.8202765 0.80733945] mean value: 0.8181247015599945 key: test_precision value: [0.66666667 0.6875 0.64285714 0.73333333 0.66666667 0.69230769 0.72727273 0.63636364 0.73333333 0.75 ] mean value: 0.6936301198801199 key: train_precision value: [0.81818182 0.6962963 0.7394958 0.78813559 0.7394958 0.82075472 0.76724138 0.8018018 0.78070175 0.76521739] mean value: 0.7717322348120701 key: test_recall value: [0.90909091 0.91666667 0.75 0.91666667 0.83333333 0.81818182 0.72727273 0.63636364 1. 0.81818182] mean value: 0.8325757575757575 key: train_recall value: [0.87378641 0.92156863 0.8627451 0.91176471 0.8627451 0.84466019 0.86407767 0.86407767 0.86407767 0.85436893] mean value: 0.8723872073101084 key: test_roc_auc value: [0.64204545 0.60119048 0.51785714 0.67261905 0.55952381 0.62337662 0.64935065 0.53246753 0.71428571 0.69480519] mean value: 0.6207521645021645 key: train_roc_auc value: [0.77816305 0.64047181 0.68918505 0.76056985 0.68918505 0.7738926 0.72110133 0.76016383 0.73672633 0.71624697] mean value: 0.7265705877820383 key: test_jcc value: [0.625 0.64705882 0.52941176 0.6875 0.58823529 0.6 0.57142857 0.46666667 0.73333333 0.64285714] mean value: 0.6091491596638655 key: train_jcc value: [0.73170732 0.65734266 0.66165414 0.73228346 0.66165414 0.71311475 0.68461538 0.712 0.6953125 0.67692308] mean value: 0.6926607425296271 MCC on Blind test: 0.45 Accuracy on Blind test: 0.71 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01034927 0.00944018 0.00949478 0.00973845 0.00962996 0.00887012 0.0089128 0.00977898 0.00905228 0.00937176] mean value: 0.009463858604431153 key: score_time value: [0.00965571 0.00940299 0.00865912 0.00867152 0.009197 0.00870323 0.00895262 0.0092957 0.00875807 0.00866055] mean value: 0.008995652198791504 key: test_mcc value: [ 0.23262105 0.23262105 -0.01163105 0.28690229 0.32142857 0.34188173 -0.02548236 -0.32232919 -0.16883117 0.43320011] mean value: 0.1320381044112035 key: train_mcc value: [0.38992541 0.37624725 0.38970588 0.37720787 0.42954422 0.36848818 0.4353138 0.48789999 0.33479889 0.37453283] mean value: 0.39636643214511924 key: test_accuracy value: [0.63157895 0.63157895 0.47368421 0.68421053 0.68421053 0.66666667 0.5 0.38888889 0.44444444 0.72222222] mean value: 0.5827485380116959 key: train_accuracy value: [0.71084337 0.69879518 0.71084337 0.71084337 0.72891566 0.69461078 0.73053892 0.76047904 0.68263473 0.7005988 ] mean value: 0.7129103239304524 key: test_fscore value: [0.69565217 0.69565217 0.5 0.76923077 0.75 0.7 0.57142857 0.52173913 0.54545455 0.76190476] mean value: 0.6511062126279518 key: train_fscore value: [0.76470588 0.74747475 0.76470588 0.77358491 0.77832512 0.74371859 0.77832512 0.80952381 0.73891626 0.75247525] mean value: 0.7651755570317448 key: test_precision value: [0.66666667 0.72727273 0.625 0.71428571 0.75 0.77777778 0.6 0.5 0.54545455 0.8 ] mean value: 0.6706457431457431 key: train_precision value: [0.77227723 0.77083333 0.76470588 0.74545455 0.78217822 0.77083333 0.79 0.79439252 0.75 0.76767677] mean value: 0.7708351831059962 key: test_recall value: [0.72727273 0.66666667 0.41666667 0.83333333 0.75 0.63636364 0.54545455 0.54545455 0.54545455 0.72727273] mean value: 0.6393939393939394 key: train_recall value: [0.75728155 0.7254902 0.76470588 0.80392157 0.7745098 0.7184466 0.76699029 0.82524272 0.72815534 0.73786408] mean value: 0.7602608033504664 key: test_roc_auc value: [0.61363636 0.61904762 0.49404762 0.63095238 0.66071429 0.67532468 0.48701299 0.34415584 0.41558442 0.72077922] mean value: 0.5661255411255411 key: train_roc_auc value: [0.69610109 0.6908701 0.69485294 0.68321078 0.7153799 0.6873483 0.71943265 0.74074636 0.66876517 0.68924454] mean value: 0.6985951834212649 key: test_jcc value: [0.53333333 0.53333333 0.33333333 0.625 0.6 0.53846154 0.4 0.35294118 0.375 0.61538462] mean value: 0.4906787330316742 key: train_jcc value: [0.61904762 0.59677419 0.61904762 0.63076923 0.63709677 0.592 0.63709677 0.68 0.5859375 0.6031746 ] mean value: 0.6200944313974556 MCC on Blind test: 0.45 Accuracy on Blind test: 0.72 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01021385 0.0117085 0.00883484 0.00892735 0.0085423 0.00943303 0.00997519 0.00961924 0.00955629 0.00957227] mean value: 0.009638285636901856 key: score_time value: [0.05058432 0.03134537 0.01030421 0.00968504 0.01054215 0.01008773 0.01054263 0.01056457 0.01067567 0.01038671] mean value: 0.01647183895111084 key: test_mcc value: [-0.07954545 -0.26196842 -0.28414557 -0.33071891 -0.20865621 -0.16883117 -0.05096472 -0.0805823 -0.42640143 0.26856633] mean value: -0.1623247858043403 key: train_mcc value: [0.40149161 0.42213076 0.37917381 0.40791958 0.42567075 0.39903847 0.35572255 0.39451676 0.39528332 0.41049956] mean value: 0.399144718616529 key: test_accuracy value: [0.47368421 0.52631579 0.42105263 0.47368421 0.47368421 0.44444444 0.55555556 0.5 0.38888889 0.66666667] mean value: 0.49239766081871345 key: train_accuracy value: [0.72891566 0.73493976 0.71686747 0.72891566 0.73493976 0.7245509 0.70658683 0.7245509 0.7245509 0.73053892] mean value: 0.7255356756366784 key: test_fscore value: [0.54545455 0.68965517 0.56 0.64285714 0.61538462 0.54545455 0.69230769 0.60869565 0.56 0.75 ] mean value: 0.6209809366046247 key: train_fscore value: [0.80176211 0.8018018 0.79111111 0.79820628 0.7962963 0.79090909 0.78026906 0.8 0.79646018 0.79820628] mean value: 0.7955022205996671 key: test_precision value: [0.54545455 0.58823529 0.53846154 0.5625 0.57142857 0.54545455 0.6 0.58333333 0.5 0.69230769] mean value: 0.5727175520557873 key: train_precision value: [0.73387097 0.74166667 0.72357724 0.73553719 0.75438596 0.74358974 0.725 0.72440945 0.73170732 0.74166667] mean value: 0.7355411201324364 key: test_recall value: [0.54545455 0.83333333 0.58333333 0.75 0.66666667 0.54545455 0.81818182 0.63636364 0.63636364 0.81818182] mean value: 0.6833333333333333 key: train_recall value: [0.88349515 0.87254902 0.87254902 0.87254902 0.84313725 0.84466019 0.84466019 0.89320388 0.87378641 0.86407767] mean value: 0.8664667808871122 key: test_roc_auc value: [0.46022727 0.41666667 0.36309524 0.375 0.4047619 0.41558442 0.48051948 0.46103896 0.31818182 0.62337662] mean value: 0.4318452380952381 key: train_roc_auc value: [0.67984281 0.69408701 0.67064951 0.68627451 0.70281863 0.6879551 0.6645176 0.67316444 0.6790807 0.68985133] mean value: 0.6828241642530799 key: test_jcc value: [0.375 0.52631579 0.38888889 0.47368421 0.44444444 0.375 0.52941176 0.4375 0.38888889 0.6 ] mean value: 0.45391339869281044 key: train_jcc value: [0.66911765 0.66917293 0.65441176 0.6641791 0.66153846 0.65413534 0.63970588 0.66666667 0.66176471 0.6641791 ] mean value: 0.6604871607837044 MCC on Blind test: 0.08 Accuracy on Blind test: 0.54 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.0124259 0.0118289 0.01121783 0.01115513 0.01249623 0.01109719 0.01236582 0.01226473 0.01196218 0.01111054] mean value: 0.011792445182800293 key: score_time value: [0.01009512 0.00946808 0.00996041 0.0101192 0.00939679 0.01001549 0.00991511 0.0099535 0.00997281 0.00915146] mean value: 0.009804797172546387 key: test_mcc value: [ 0.40219983 0.26772484 0.3086067 0.3086067 0.3086067 0.39594419 0.39594419 -0.05096472 0.3040345 0.39594419] mean value: 0.303664710755213 key: train_mcc value: [0.5635375 0.54404241 0.59782919 0.56865593 0.53158234 0.54476067 0.53640723 0.58634752 0.54476067 0.59862298] mean value: 0.5616546427256583 key: test_accuracy value: [0.68421053 0.68421053 0.68421053 0.68421053 0.68421053 0.72222222 0.72222222 0.55555556 0.66666667 0.72222222] mean value: 0.6809941520467836 key: train_accuracy value: [0.78313253 0.77108434 0.80120482 0.78313253 0.76506024 0.77245509 0.77245509 0.79640719 0.77245509 0.80239521] mean value: 0.7819782122501984 key: test_fscore value: [0.78571429 0.78571429 0.8 0.8 0.8 0.8 0.8 0.69230769 0.78571429 0.8 ] mean value: 0.784945054945055 key: train_fscore value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [0.85123967 0.84297521 0.85957447 0.85 0.83950617 0.8442623 0.84297521 0.85714286 0.8442623 0.86075949] mean value: 0.8492697664546919 key: test_precision value: [0.64705882 0.6875 0.66666667 0.66666667 0.66666667 0.71428571 0.71428571 0.6 0.64705882 0.71428571] mean value: 0.6724474789915966 key: train_precision value: [0.74100719 0.72857143 0.7593985 0.73913043 0.72340426 0.73049645 0.73381295 0.75555556 0.73049645 0.76119403] mean value: 0.74030672520064 key: test_recall value: [1. 0.91666667 1. 1. 1. 0.90909091 0.90909091 0.81818182 1. 0.90909091] mean value: 0.9462121212121212 key: train_recall value: [1. 1. 0.99019608 1. 1. 1. 0.99029126 0.99029126 1. 0.99029126] mean value: 0.9961069864839139 key: test_roc_auc value: [0.625 0.60119048 0.57142857 0.57142857 0.57142857 0.66883117 0.66883117 0.48051948 0.57142857 0.66883117] mean value: 0.5998917748917749 key: train_roc_auc value: [0.71428571 0.703125 0.74509804 0.71875 0.6953125 0.703125 0.70608313 0.73733313 0.703125 0.74514563] mean value: 0.7171383146705284 key: test_jcc value: [0.64705882 0.64705882 0.66666667 0.66666667 0.66666667 0.66666667 0.66666667 0.52941176 0.64705882 0.66666667] mean value: 0.6470588235294118 key: train_jcc value: [0.74100719 0.72857143 0.75373134 0.73913043 0.72340426 0.73049645 0.72857143 0.75 0.73049645 0.75555556] mean value: 0.7380964548129775 MCC on Blind test: 0.37 Accuracy on Blind test: 0.64 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.14798498 0.80445504 0.68168712 0.778126 0.67166209 0.71176672 0.82175732 0.63936996 0.6600368 0.79345798] mean value: 0.7710304021835327 key: score_time value: [0.01561236 0.01476002 0.01496482 0.01485348 0.01530099 0.01830029 0.01525712 0.01215148 0.0121603 0.02092552] mean value: 0.015428638458251953 key: test_mcc value: [ 0.33796318 0.18531233 0.32142857 0.32142857 0.45361105 0.64465837 0.79772404 -0.0805823 0.01413507 0.40291148] mean value: 0.33985903712040866 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.68421053 0.63157895 0.68421053 0.68421053 0.73684211 0.83333333 0.88888889 0.5 0.55555556 0.72222222] mean value: 0.6921052631578948 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.75 0.72 0.75 0.75 0.7826087 0.86956522 0.9 0.60869565 0.66666667 0.7826087 ] mean value: 0.7580144927536232 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.69230769 0.69230769 0.75 0.75 0.81818182 0.83333333 1. 0.58333333 0.61538462 0.75 ] mean value: 0.7484848484848485 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.75 0.75 0.75 0.75 0.90909091 0.81818182 0.63636364 0.72727273 0.81818182] mean value: 0.7727272727272727 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.65909091 0.58928571 0.66071429 0.66071429 0.73214286 0.81168831 0.90909091 0.46103896 0.50649351 0.69480519] mean value: 0.6685064935064935 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.6 0.5625 0.6 0.6 0.64285714 0.76923077 0.81818182 0.4375 0.5 0.64285714] mean value: 0.6173126873126873 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.26 Accuracy on Blind test: 0.63 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.0201478 0.01317859 0.01518607 0.01245022 0.01219487 0.01223302 0.01228309 0.01388884 0.01354384 0.01307154] mean value: 0.013817787170410156 key: score_time value: [0.01178908 0.00907445 0.00897789 0.00866628 0.00874162 0.00875974 0.00865197 0.008744 0.00954509 0.00889468] mean value: 0.009184479713439941 key: test_mcc value: [0.45361105 1. 0.65133895 0.80507649 0.54761905 0.66254135 0.79772404 0.26856633 0.88640526 0.53246753] mean value: 0.6605350037589758 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73684211 1. 0.78947368 0.89473684 0.78947368 0.83333333 0.88888889 0.66666667 0.94444444 0.77777778] mean value: 0.8321637426900584 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.7826087 1. 0.8 0.90909091 0.83333333 0.85714286 0.9 0.75 0.95652174 0.81818182] mean value: 0.8606879352531527 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 1. 1. 1. 0.83333333 0.9 1. 0.69230769 0.91666667 0.81818182] mean value: 0.8910489510489511 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 1. 0.66666667 0.83333333 0.83333333 0.81818182 0.81818182 0.81818182 1. 0.81818182] mean value: 0.8424242424242424 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.72159091 1. 0.83333333 0.91666667 0.77380952 0.83766234 0.90909091 0.62337662 0.92857143 0.76623377] mean value: 0.8310335497835498 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.64285714 1. 0.66666667 0.83333333 0.71428571 0.75 0.81818182 0.6 0.91666667 0.69230769] mean value: 0.7634299034299035 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.06 Accuracy on Blind test: 0.52 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.09460139 0.09494543 0.09293103 0.09291887 0.09456015 0.09315991 0.09335065 0.09217739 0.09356213 0.10142875] mean value: 0.09436357021331787 key: score_time value: [0.01741433 0.01736999 0.01729822 0.01721883 0.01758862 0.01811028 0.01707053 0.01714253 0.01738429 0.01856899] mean value: 0.017516660690307616 key: test_mcc value: [0.08257228 0.53468154 0.1495142 0.42004128 0.32142857 0.39594419 0.76623377 0.39594419 0.2548236 0.20385888] mean value: 0.35250424955725035 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.57894737 0.78947368 0.57894737 0.73684211 0.68421053 0.72222222 0.88888889 0.72222222 0.66666667 0.61111111] mean value: 0.697953216374269 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.69230769 0.84615385 0.63636364 0.8 0.75 0.8 0.90909091 0.8 0.76923077 0.66666667] mean value: 0.766981351981352 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.6 0.78571429 0.7 0.76923077 0.75 0.71428571 0.90909091 0.71428571 0.66666667 0.7 ] mean value: 0.7309274059274059 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.91666667 0.58333333 0.83333333 0.75 0.90909091 0.90909091 0.90909091 0.90909091 0.63636364] mean value: 0.8174242424242424 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.53409091 0.74404762 0.57738095 0.70238095 0.66071429 0.66883117 0.88311688 0.66883117 0.5974026 0.6038961 ] mean value: 0.664069264069264 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.52941176 0.73333333 0.46666667 0.66666667 0.6 0.66666667 0.83333333 0.66666667 0.625 0.5 ] mean value: 0.6287745098039216 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.32 Accuracy on Blind test: 0.65 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01018333 0.0099957 0.00997639 0.00996017 0.01004481 0.00998259 0.00940919 0.00883722 0.00895667 0.00889516] mean value: 0.009624123573303223 key: score_time value: [0.00936508 0.00946832 0.00939035 0.00943542 0.00943351 0.00934362 0.00854778 0.00865889 0.0086751 0.00854993] mean value: 0.009086799621582032 key: test_mcc value: [ 0.25844328 0.0952381 0.13095238 0.1495142 0.32142857 0.43320011 0.48416483 -0.0805823 0.64465837 0.40291148] mean value: 0.2839929034172316 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.63157895 0.57894737 0.52631579 0.57894737 0.68421053 0.72222222 0.72222222 0.5 0.83333333 0.72222222] mean value: 0.65 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.66666667 0.52631579 0.63636364 0.75 0.76190476 0.73684211 0.60869565 0.86956522 0.7826087 ] mean value: 0.7005629191555965 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.7 0.66666667 0.71428571 0.7 0.75 0.8 0.875 0.58333333 0.83333333 0.75 ] mean value: 0.7372619047619048 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.63636364 0.66666667 0.41666667 0.58333333 0.75 0.72727273 0.63636364 0.63636364 0.90909091 0.81818182] mean value: 0.678030303030303 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.63068182 0.54761905 0.56547619 0.57738095 0.66071429 0.72077922 0.74675325 0.46103896 0.81168831 0.69480519] mean value: 0.641693722943723 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.5 0.35714286 0.46666667 0.6 0.61538462 0.58333333 0.4375 0.76923077 0.64285714] mean value: 0.5472115384615385 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.25 Accuracy on Blind test: 0.62 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.21576595 1.23462939 1.23632216 1.24061584 1.27298141 1.25783706 1.2260108 1.21766496 1.2200973 1.23109865] mean value: 1.2353023529052733 key: score_time value: [0.08846188 0.09100533 0.09566069 0.15491486 0.09412932 0.08944058 0.08805823 0.09122467 0.09319806 0.094805 ] mean value: 0.09808986186981201 key: test_mcc value: [0.60553007 0.88949918 0.67460105 0.65477023 0.56694671 0.64465837 0.89188259 0.39594419 0.77742884 0.52299758] mean value: 0.6624258812978574 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.78947368 0.94736842 0.84210526 0.84210526 0.78947368 0.83333333 0.94444444 0.72222222 0.88888889 0.77777778] mean value: 0.837719298245614 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.84615385 0.96 0.86956522 0.88 0.85714286 0.86956522 0.95238095 0.8 0.91666667 0.83333333] mean value: 0.8784808090460264 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.73333333 0.92307692 0.90909091 0.84615385 0.75 0.83333333 1. 0.71428571 0.84615385 0.76923077] mean value: 0.8324658674658675 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 0.83333333 0.91666667 1. 0.90909091 0.90909091 0.90909091 1. 0.90909091] mean value: 0.9386363636363636 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.75 0.92857143 0.8452381 0.81547619 0.71428571 0.81168831 0.95454545 0.66883117 0.85714286 0.74025974] mean value: 0.8086038961038962 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.73333333 0.92307692 0.76923077 0.78571429 0.75 0.76923077 0.90909091 0.66666667 0.84615385 0.71428571] mean value: 0.7866783216783216 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.29 Accuracy on Blind test: 0.62 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: fit_time value: [1.74438 0.86402607 0.87671947 0.88783979 0.97754407 0.87419868 0.87052846 0.87867832 0.88394403 0.91006446] mean value: 0.9767923355102539 key: score_time value: [0.22032857 0.17637062 0.18659782 0.2488842 0.18379068 0.22443295 0.2052598 0.24423671 0.17268133 0.18593717] mean value: 0.20485198497772217 key: test_mcc value: [0.60553007 0.65477023 0.53468154 0.88949918 0.53468154 0.39594419 0.76623377 0.2548236 0.67005939 0.67005939] mean value: 0.5976282906983714 key: train_mcc value: [0.89849587 0.88685769 0.87457979 0.87457979 0.88685769 0.88899836 0.8872319 0.91188694 0.86279135 0.89953068] mean value: 0.8871810060846004 key: test_accuracy value: [0.78947368 0.84210526 0.78947368 0.94736842 0.78947368 0.72222222 0.88888889 0.66666667 0.83333333 0.83333333] mean value: 0.810233918128655 key: train_accuracy value: [0.95180723 0.94578313 0.93975904 0.93975904 0.94578313 0.94610778 0.94610778 0.95808383 0.93413174 0.95209581] mean value: 0.9459418512372845 key: test_fscore value: [0.84615385 0.88 0.84615385 0.96 0.84615385 0.8 0.90909091 0.76923077 0.88 0.88 ] mean value: 0.8616783216783217 key: train_fscore value: [0.96226415 0.95734597 0.95283019 0.95283019 0.95734597 0.95813953 0.95774648 0.96682464 0.94883721 0.96226415] mean value: 0.9576428489982294 key: test_precision value: [0.73333333 0.84615385 0.78571429 0.92307692 0.78571429 0.71428571 0.90909091 0.66666667 0.78571429 0.78571429] mean value: 0.7935464535464536 key: train_precision value: [0.93577982 0.9266055 0.91818182 0.91818182 0.9266055 0.91964286 0.92727273 0.94444444 0.91071429 0.93577982] mean value: 0.9263208593139786 key: test_recall value: [1. 0.91666667 0.91666667 1. 0.91666667 0.90909091 0.90909091 0.90909091 1. 1. ] mean value: 0.9477272727272728 key: train_recall value: [0.99029126 0.99019608 0.99019608 0.99019608 0.99019608 1. 0.99029126 0.99029126 0.99029126 0.99029126] mean value: 0.9912240624405102 key: test_roc_auc value: [0.75 0.81547619 0.74404762 0.92857143 0.74404762 0.66883117 0.88311688 0.5974026 0.78571429 0.78571429] mean value: 0.7702922077922078 key: train_roc_auc value: [0.93959008 0.93259804 0.92478554 0.92478554 0.93259804 0.9296875 0.93264563 0.94827063 0.91702063 0.94045813] mean value: 0.9322439756646995 key: test_jcc value: [0.73333333 0.78571429 0.73333333 0.92307692 0.73333333 0.66666667 0.83333333 0.625 0.78571429 0.78571429] mean value: 0.760521978021978 key: train_jcc value: [0.92727273 0.91818182 0.90990991 0.90990991 0.91818182 0.91964286 0.91891892 0.93577982 0.90265487 0.92727273] mean value: 0.9187725370561085 MCC on Blind test: 0.35 Accuracy on Blind test: 0.64 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01017141 0.01006699 0.01010513 0.01003337 0.01007533 0.0101161 0.00903153 0.00938582 0.00902843 0.00992584] mean value: 0.009793996810913086 key: score_time value: [0.00961804 0.00945258 0.00932074 0.00960994 0.0095284 0.00941348 0.00897074 0.0086844 0.00946951 0.00944734] mean value: 0.009351515769958496 key: test_mcc value: [ 0.23262105 0.23262105 -0.01163105 0.28690229 0.32142857 0.34188173 -0.02548236 -0.32232919 -0.16883117 0.43320011] mean value: 0.1320381044112035 key: train_mcc value: [0.38992541 0.37624725 0.38970588 0.37720787 0.42954422 0.36848818 0.4353138 0.48789999 0.33479889 0.37453283] mean value: 0.39636643214511924 key: test_accuracy value: [0.63157895 0.63157895 0.47368421 0.68421053 0.68421053 0.66666667 0.5 0.38888889 0.44444444 0.72222222] mean value: 0.5827485380116959 key: train_accuracy value: [0.71084337 0.69879518 0.71084337 0.71084337 0.72891566 0.69461078 0.73053892 0.76047904 0.68263473 0.7005988 ] mean value: 0.7129103239304524 key: test_fscore value: [0.69565217 0.69565217 0.5 0.76923077 0.75 0.7 0.57142857 0.52173913 0.54545455 0.76190476] mean value: 0.6511062126279518 key: train_fscore value: [0.76470588 0.74747475 0.76470588 0.77358491 0.77832512 0.74371859 0.77832512 0.80952381 0.73891626 0.75247525] mean value: 0.7651755570317448 key: test_precision value: [0.66666667 0.72727273 0.625 0.71428571 0.75 0.77777778 0.6 0.5 0.54545455 0.8 ] mean value: 0.6706457431457431 key: train_precision value: [0.77227723 0.77083333 0.76470588 0.74545455 0.78217822 0.77083333 0.79 0.79439252 0.75 0.76767677] mean value: 0.7708351831059962 key: test_recall value: [0.72727273 0.66666667 0.41666667 0.83333333 0.75 0.63636364 0.54545455 0.54545455 0.54545455 0.72727273] mean value: 0.6393939393939394 key: train_recall value: [0.75728155 0.7254902 0.76470588 0.80392157 0.7745098 0.7184466 0.76699029 0.82524272 0.72815534 0.73786408] mean value: 0.7602608033504664 key: test_roc_auc value: [0.61363636 0.61904762 0.49404762 0.63095238 0.66071429 0.67532468 0.48701299 0.34415584 0.41558442 0.72077922] mean value: 0.5661255411255411 key: train_roc_auc value: [0.69610109 0.6908701 0.69485294 0.68321078 0.7153799 0.6873483 0.71943265 0.74074636 0.66876517 0.68924454] mean value: 0.6985951834212649 key: test_jcc value: [0.53333333 0.53333333 0.33333333 0.625 0.6 0.53846154 0.4 0.35294118 0.375 0.61538462] mean value: 0.4906787330316742 key: train_jcc value: [0.61904762 0.59677419 0.61904762 0.63076923 0.63709677 0.592 0.63709677 0.68 0.5859375 0.6031746 ] mean value: 0.6200944313974556 MCC on Blind test: 0.45 Accuracy on Blind test: 0.72 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.09132409 0.05846572 0.06459308 0.05377865 0.05312014 0.0551908 0.05610585 0.05561304 0.05613637 0.05491066] mean value: 0.059923839569091794 key: score_time value: [0.01047754 0.01103997 0.01097846 0.0104785 0.01049376 0.01075339 0.01066589 0.01044464 0.01035118 0.01046562] mean value: 0.010614895820617675 key: test_mcc value: [0.45361105 0.88949918 0.89559105 1. 0.7824608 0.76623377 0.89188259 0.39594419 0.88640526 0.43320011] mean value: 0.7394827993424129 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73684211 0.94736842 0.94736842 1. 0.89473684 0.88888889 0.94444444 0.72222222 0.94444444 0.72222222] mean value: 0.8748538011695907 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.7826087 0.96 0.95652174 1. 0.92307692 0.90909091 0.95238095 0.8 0.95652174 0.76190476] mean value: 0.900210572036659 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 0.92307692 1. 1. 0.85714286 0.90909091 1. 0.71428571 0.91666667 0.8 ] mean value: 0.887026307026307 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 1. 0.91666667 1. 1. 0.90909091 0.90909091 0.90909091 1. 0.72727273] mean value: 0.918939393939394 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.72159091 0.92857143 0.95833333 1. 0.85714286 0.88311688 0.95454545 0.66883117 0.92857143 0.72077922] mean value: 0.8621482683982684 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.64285714 0.92307692 0.91666667 1. 0.85714286 0.83333333 0.90909091 0.66666667 0.91666667 0.61538462] mean value: 0.828088578088578 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.53 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04992199 0.06900644 0.06069613 0.06046605 0.05153871 0.09395623 0.06998301 0.05825424 0.024194 0.04331779] mean value: 0.05813345909118652 key: score_time value: [0.02851057 0.03522396 0.02049589 0.02068567 0.02075195 0.01262808 0.02250338 0.01193881 0.01193452 0.0223732 ] mean value: 0.02070460319519043 key: test_mcc value: [0.45868247 0.36803496 0.43034895 0.42004128 0.77380952 0.88640526 0.48416483 0.4025974 0.12182898 0.48416483] mean value: 0.4830078495368003 key: train_mcc value: [0.96182348 0.94915491 0.92371324 0.96223327 0.93656134 0.89835373 0.97466626 0.9748321 0.96196428 0.96196428] mean value: 0.9505266878223336 key: test_accuracy value: [0.73684211 0.68421053 0.68421053 0.73684211 0.89473684 0.94444444 0.72222222 0.66666667 0.61111111 0.72222222] mean value: 0.7403508771929824 key: train_accuracy value: [0.98192771 0.97590361 0.96385542 0.98192771 0.96987952 0.95209581 0.98802395 0.98802395 0.98203593 0.98203593] mean value: 0.976570954476589 key: test_fscore value: [0.8 0.72727273 0.7 0.8 0.91666667 0.95652174 0.73684211 0.66666667 0.72 0.73684211] mean value: 0.7760812010262811 key: train_fscore value: [0.98536585 0.98058252 0.97058824 0.98550725 0.97584541 0.96153846 0.99029126 0.99038462 0.98550725 0.98550725] mean value: 0.9811118102041951 key: test_precision value: [0.71428571 0.8 0.875 0.76923077 0.91666667 0.91666667 0.875 0.85714286 0.64285714 0.875 ] mean value: 0.8241849816849817 key: train_precision value: [0.99019608 0.97115385 0.97058824 0.97142857 0.96190476 0.95238095 0.99029126 0.98095238 0.98076923 0.98076923] mean value: 0.9750434550220387 key: test_recall value: [0.90909091 0.66666667 0.58333333 0.83333333 0.91666667 1. 0.63636364 0.54545455 0.81818182 0.63636364] mean value: 0.7545454545454545 key: train_recall value: [0.98058252 0.99019608 0.97058824 1. 0.99019608 0.97087379 0.99029126 1. 0.99029126 0.99029126] mean value: 0.9873310489244241 key: test_roc_auc value: [0.70454545 0.69047619 0.7202381 0.70238095 0.88690476 0.92857143 0.74675325 0.7012987 0.55194805 0.74675325] mean value: 0.737987012987013 key: train_roc_auc value: [0.98235475 0.97166054 0.96185662 0.9765625 0.96384804 0.94637439 0.98733313 0.984375 0.97952063 0.97952063] mean value: 0.9733406236685613 key: test_jcc value: [0.66666667 0.57142857 0.53846154 0.66666667 0.84615385 0.91666667 0.58333333 0.5 0.5625 0.58333333] mean value: 0.6435210622710623 key: train_jcc value: [0.97115385 0.96190476 0.94285714 0.97142857 0.95283019 0.92592593 0.98076923 0.98095238 0.97142857 0.97142857] mean value: 0.9630679191528249 MCC on Blind test: 0.16 Accuracy on Blind test: 0.58 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.0232811 0.00937867 0.00896811 0.00889492 0.00896215 0.00895619 0.00900936 0.01012516 0.01084137 0.00878739] mean value: 0.010720443725585938 key: score_time value: [0.00920415 0.00897765 0.0090692 0.0088675 0.0088439 0.00873423 0.00930619 0.00982428 0.00979805 0.00853658] mean value: 0.009116172790527344 key: test_mcc value: [ 0.34405118 0.18531233 -0.04941662 0.42004128 0.14085904 0.26856633 0.2987013 0.16116459 0.0805823 0.26856633] mean value: 0.2118428048944046 key: train_mcc value: [0.37947231 0.36682397 0.37021128 0.40845955 0.39898595 0.3183612 0.34304366 0.3576444 0.42468968 0.34769188] mean value: 0.37153838914990045 key: test_accuracy value: [0.68421053 0.63157895 0.52631579 0.73684211 0.63157895 0.66666667 0.66666667 0.61111111 0.61111111 0.66666667] mean value: 0.6432748538011696 key: train_accuracy value: [0.71686747 0.71084337 0.71084337 0.72891566 0.72289157 0.68862275 0.7005988 0.70658683 0.73652695 0.7005988 ] mean value: 0.7123295577519659 key: test_fscore value: [0.76923077 0.72 0.64 0.8 0.74074074 0.75 0.72727273 0.69565217 0.74074074 0.75 ] mean value: 0.7333637151898021 key: train_fscore value: [0.78538813 0.78378378 0.77981651 0.80519481 0.78703704 0.76363636 0.77477477 0.77828054 0.8018018 0.7706422 ] mean value: 0.7830355952665203 key: test_precision value: [0.66666667 0.69230769 0.61538462 0.76923077 0.66666667 0.69230769 0.72727273 0.66666667 0.625 0.69230769] mean value: 0.6813811188811189 key: train_precision value: [0.74137931 0.725 0.73275862 0.72093023 0.74561404 0.71794872 0.72268908 0.72881356 0.74789916 0.73043478] mean value: 0.7313467493853907 key: test_recall value: [0.90909091 0.75 0.66666667 0.83333333 0.83333333 0.81818182 0.72727273 0.72727273 0.90909091 0.81818182] mean value: 0.7992424242424243 key: train_recall value: [0.83495146 0.85294118 0.83333333 0.91176471 0.83333333 0.81553398 0.83495146 0.83495146 0.86407767 0.81553398] mean value: 0.8431372549019608 key: test_roc_auc value: [0.64204545 0.58928571 0.47619048 0.70238095 0.55952381 0.62337662 0.64935065 0.57792208 0.52597403 0.62337662] mean value: 0.5969426406926407 key: train_roc_auc value: [0.67938049 0.66865809 0.67447917 0.67463235 0.69010417 0.64995449 0.65966323 0.66747573 0.69766383 0.66557949] mean value: 0.6727591036414566 key: test_jcc value: [0.625 0.5625 0.47058824 0.66666667 0.58823529 0.6 0.57142857 0.53333333 0.58823529 0.6 ] mean value: 0.5805987394957983 key: train_jcc value: [0.64661654 0.64444444 0.63909774 0.67391304 0.64885496 0.61764706 0.63235294 0.63703704 0.66917293 0.62686567] mean value: 0.6436002376478708 MCC on Blind test: 0.43 Accuracy on Blind test: 0.7 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01141 0.01607895 0.01427817 0.01620102 0.01566148 0.01515269 0.01575112 0.01580167 0.01595831 0.01498699] mean value: 0.015128040313720703 key: score_time value: [0.00860381 0.01093388 0.01091313 0.01146984 0.01156068 0.01149845 0.0115273 0.01151919 0.01151872 0.01149392] mean value: 0.01110389232635498 key: test_mcc value: [0.35227273 0.7824608 0.36803496 0.58655573 0.40849122 0.76623377 0.56061191 0.2548236 0.40291148 0.32232919] mean value: 0.48047253774078613 key: train_mcc value: [0.87956612 0.94974006 0.81149011 0.84765971 0.81698712 0.8872319 0.54476067 0.91320801 0.83195371 0.74686754] mean value: 0.8229464950111158 key: test_accuracy value: [0.68421053 0.89473684 0.68421053 0.78947368 0.73684211 0.88888889 0.77777778 0.66666667 0.72222222 0.61111111] mean value: 0.7456140350877193 key: train_accuracy value: [0.93975904 0.97590361 0.90963855 0.92168675 0.90963855 0.94610778 0.77245509 0.95808383 0.91017964 0.85628743] mean value: 0.9099740278479186 key: test_fscore value: [0.72727273 0.92307692 0.72727273 0.81818182 0.81481481 0.90909091 0.84615385 0.76923077 0.7826087 0.58823529] mean value: 0.7905938524864355 key: train_fscore value: [0.94949495 0.98019802 0.93023256 0.93264249 0.93150685 0.95774648 0.8442623 0.96713615 0.92146597 0.86813187] mean value: 0.9282817624706369 key: test_precision value: [0.72727273 0.85714286 0.8 0.9 0.73333333 0.90909091 0.73333333 0.66666667 0.75 0.83333333] mean value: 0.791017316017316 key: train_precision value: [0.98947368 0.99 0.88495575 0.98901099 0.87179487 0.92727273 0.73049645 0.93636364 1. 1. ] mean value: 0.9319368114765849 key: test_recall value: [0.72727273 1. 0.66666667 0.75 0.91666667 0.90909091 1. 0.90909091 0.81818182 0.45454545] mean value: 0.8151515151515152 key: train_recall value: [0.91262136 0.97058824 0.98039216 0.88235294 1. 0.99029126 1. 1. 0.85436893 0.76699029] mean value: 0.9357605177993528 key: test_roc_auc value: [0.67613636 0.85714286 0.69047619 0.80357143 0.67261905 0.88311688 0.71428571 0.5974026 0.69480519 0.65584416] mean value: 0.7245400432900433 key: train_roc_auc value: [0.94837417 0.97748162 0.88863358 0.93336397 0.8828125 0.93264563 0.703125 0.9453125 0.92718447 0.88349515] mean value: 0.9022428581060256 key: test_jcc value: [0.57142857 0.85714286 0.57142857 0.69230769 0.6875 0.83333333 0.73333333 0.625 0.64285714 0.41666667] mean value: 0.6630998168498169 key: train_jcc value: [0.90384615 0.96116505 0.86956522 0.87378641 0.87179487 0.91891892 0.73049645 0.93636364 0.85436893 0.76699029] mean value: 0.8687295931827245 MCC on Blind test: 0.3 Accuracy on Blind test: 0.64 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01471901 0.01381922 0.01489019 0.01392627 0.01413941 0.01386118 0.01413894 0.01493621 0.01305819 0.01561236] mean value: 0.014310097694396973 key: score_time value: [0.01162362 0.01153278 0.01147699 0.01158118 0.01145577 0.01147771 0.01151061 0.01157641 0.0116725 0.01168513] mean value: 0.011559271812438964 key: test_mcc value: [0.29545455 0.65133895 0.53468154 0.3086067 0.65477023 0.66254135 0. 0.3040345 0.2987013 0.2548236 ] mean value: 0.39649527059477535 key: train_mcc value: [0.76345722 0.73678413 0.61692545 0.46724931 0.91088941 0.80279484 0.28456079 0.54476067 0.64944256 0.95111825] mean value: 0.6727982631270347 key: test_accuracy value: [0.63157895 0.78947368 0.78947368 0.68421053 0.84210526 0.83333333 0.61111111 0.66666667 0.66666667 0.61111111] mean value: 0.7125730994152046 key: train_accuracy value: [0.86746988 0.84939759 0.80722892 0.73493976 0.95783133 0.89820359 0.66467066 0.77245509 0.82035928 0.9760479 ] mean value: 0.8348603996825626 key: test_fscore value: [0.63157895 0.8 0.84615385 0.8 0.88 0.85714286 0.75862069 0.78571429 0.72727273 0.63157895] mean value: 0.7718062300675731 key: train_fscore value: [0.88043478 0.8603352 0.86440678 0.82258065 0.96618357 0.9119171 0.78625954 0.8442623 0.84210526 0.98019802] mean value: 0.8758683196313127 key: test_precision value: [0.75 1. 0.78571429 0.66666667 0.84615385 0.9 0.61111111 0.64705882 0.72727273 0.75 ] mean value: 0.7683977460448048 key: train_precision value: [1. 1. 0.76119403 0.69863014 0.95238095 0.97777778 0.64779874 0.73049645 0.91954023 1. ] mean value: 0.8687818322919909 key: test_recall value: [0.54545455 0.66666667 0.91666667 1. 0.91666667 0.81818182 1. 1. 0.72727273 0.54545455] mean value: 0.8136363636363636 key: train_recall value: [0.78640777 0.75490196 1. 1. 0.98039216 0.85436893 1. 1. 0.77669903 0.96116505] mean value: 0.9113934894346087 key: test_roc_auc value: [0.64772727 0.83333333 0.74404762 0.57142857 0.81547619 0.83766234 0.5 0.57142857 0.64935065 0.62987013] mean value: 0.6800324675324675 key: train_roc_auc value: [0.89320388 0.87745098 0.75 0.65625 0.95113358 0.91155947 0.5625 0.703125 0.83366201 0.98058252] mean value: 0.8119467447173044 key: test_jcc value: [0.46153846 0.66666667 0.73333333 0.66666667 0.78571429 0.75 0.61111111 0.64705882 0.57142857 0.46153846] mean value: 0.635505638152697 key: train_jcc value: [0.78640777 0.75490196 0.76119403 0.69863014 0.93457944 0.83809524 0.64779874 0.73049645 0.72727273 0.96116505] mean value: 0.7840541543814717 MCC on Blind test: 0.25 Accuracy on Blind test: 0.62 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.12054205 0.10526896 0.10383749 0.10514784 0.10670519 0.10818172 0.11293268 0.11231232 0.10736513 0.10779858] mean value: 0.10900919437408448 key: score_time value: [0.01530385 0.0149107 0.01513171 0.01525569 0.01502442 0.01519895 0.0160737 0.01602936 0.0148201 0.01555538] mean value: 0.0153303861618042 key: test_mcc value: [0.56729535 0.67460105 1. 0.89559105 0.67460105 0.76623377 0.89188259 0.52299758 0.77742884 0.53246753] mean value: 0.7303098813708148 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.78947368 0.84210526 1. 0.94736842 0.84210526 0.88888889 0.94444444 0.77777778 0.88888889 0.77777778] mean value: 0.8698830409356725 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.83333333 0.86956522 1. 0.95652174 0.86956522 0.90909091 0.95238095 0.83333333 0.91666667 0.81818182] mean value: 0.8958639186900056 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.76923077 0.90909091 1. 1. 0.90909091 0.90909091 1. 0.76923077 0.84615385 0.81818182] mean value: 0.893006993006993 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.83333333 1. 0.91666667 0.83333333 0.90909091 0.90909091 0.90909091 1. 0.81818182] mean value: 0.9037878787878788 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.76704545 0.8452381 1. 0.95833333 0.8452381 0.88311688 0.95454545 0.74025974 0.85714286 0.76623377] mean value: 0.861715367965368 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.71428571 0.76923077 1. 0.91666667 0.76923077 0.83333333 0.90909091 0.71428571 0.84615385 0.69230769] mean value: 0.8164585414585415 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.16 Accuracy on Blind test: 0.56 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.04221368 0.03988671 0.04898667 0.03659725 0.04938769 0.03940272 0.03929019 0.05242944 0.05490804 0.03600407] mean value: 0.043910646438598634 key: score_time value: [0.01941323 0.02854228 0.03419876 0.02406669 0.01860428 0.01948833 0.02925777 0.02827168 0.01718235 0.01625252] mean value: 0.02352778911590576 key: test_mcc value: [0.56729535 1. 0.80507649 0.89559105 1. 0.66254135 0.79772404 0.56061191 0.88640526 0.53246753] mean value: 0.7707712971733849 key: train_mcc value: [1. 0.97457108 0.96182348 0.9873287 0.96204463 1. 0.98744925 0.97466626 0.98744925 0.94933931] mean value: 0.9784671941115295 key: test_accuracy value: [0.78947368 1. 0.89473684 0.94736842 1. 0.83333333 0.88888889 0.77777778 0.94444444 0.77777778] mean value: 0.8853801169590643 key: train_accuracy value: [1. 0.98795181 0.98192771 0.9939759 0.98192771 1. 0.99401198 0.98802395 0.99401198 0.9760479 ] mean value: 0.9897878940913354 key: test_fscore value: [0.83333333 1. 0.90909091 0.95652174 1. 0.85714286 0.9 0.84615385 0.95652174 0.81818182] mean value: 0.9076946242163634 key: train_fscore value: [1. 0.99019608 0.98536585 0.99512195 0.98522167 1. 0.99512195 0.99029126 0.99512195 0.98076923] mean value: 0.9917209953530446 key: test_precision value: [0.76923077 1. 1. 1. 1. 0.9 1. 0.73333333 0.91666667 0.81818182] mean value: 0.9137412587412588 key: train_precision value: [1. 0.99019608 0.98058252 0.99029126 0.99009901 1. 1. 0.99029126 1. 0.97142857] mean value: 0.9912888708304624 key: test_recall value: [0.90909091 1. 0.83333333 0.91666667 1. 0.81818182 0.81818182 1. 1. 0.81818182] mean value: 0.9113636363636364 key: train_recall value: [1. 0.99019608 0.99019608 1. 0.98039216 1. 0.99029126 0.99029126 0.99029126 0.99029126] mean value: 0.992194936226918 key: test_roc_auc value: [0.76704545 1. 0.91666667 0.95833333 1. 0.83766234 0.90909091 0.71428571 0.92857143 0.76623377] mean value: 0.8797889610389611 key: train_roc_auc value: [1. 0.98728554 0.97947304 0.9921875 0.98238358 1. 0.99514563 0.98733313 0.99514563 0.97170813] mean value: 0.989066218113459 key: test_jcc value: [0.71428571 1. 0.83333333 0.91666667 1. 0.75 0.81818182 0.73333333 0.91666667 0.69230769] mean value: 0.8374775224775225 key: train_jcc value: [1. 0.98058252 0.97115385 0.99029126 0.97087379 1. 0.99029126 0.98076923 0.99029126 0.96226415] mean value: 0.9836517324953852 MCC on Blind test: 0.14 Accuracy on Blind test: 0.55 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.03705645 0.05874968 0.07011271 0.0530026 0.05404568 0.05509114 0.02280903 0.02378893 0.02255702 0.03889585] mean value: 0.04361090660095215 key: score_time value: [0.02256751 0.02284932 0.02406359 0.02447772 0.02279687 0.02272964 0.01286435 0.0128026 0.01270914 0.03028536] mean value: 0.02081460952758789 key: test_mcc value: [ 0.56729535 0.14085904 0.0952381 -0.03149704 -0.12677314 0.01413507 0.12182898 0.39594419 0.01413507 0.01413507] mean value: 0.12053006836854342 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.78947368 0.63157895 0.57894737 0.57894737 0.52631579 0.55555556 0.61111111 0.72222222 0.55555556 0.55555556] mean value: 0.6105263157894737 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.83333333 0.74074074 0.66666667 0.71428571 0.66666667 0.66666667 0.72 0.8 0.66666667 0.66666667] mean value: 0.7141693121693122 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.76923077 0.66666667 0.66666667 0.625 0.6 0.61538462 0.64285714 0.71428571 0.61538462 0.61538462] mean value: 0.6530860805860806 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.83333333 0.66666667 0.83333333 0.75 0.72727273 0.81818182 0.90909091 0.72727273 0.72727273] mean value: 0.7901515151515152 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.76704545 0.55952381 0.54761905 0.48809524 0.44642857 0.50649351 0.55194805 0.66883117 0.50649351 0.50649351] mean value: 0.5548971861471862 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.71428571 0.58823529 0.5 0.55555556 0.5 0.5 0.5625 0.66666667 0.5 0.5 ] mean value: 0.5587243230625584 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.13 Accuracy on Blind test: 0.56 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.32965541 0.33532381 0.32610035 0.33066726 0.31862354 0.31567144 0.32750487 0.32567739 0.32546377 0.32319188] mean value: 0.3257879734039307 key: score_time value: [0.00963497 0.00933599 0.00911212 0.00913239 0.00947118 0.00982332 0.01011229 0.0100255 0.01008368 0.01003385] mean value: 0.009676527976989747 key: test_mcc value: [0.56818182 0.77380952 0.89559105 1. 0.7824608 0.76623377 1. 0.39594419 0.88640526 0.64465837] mean value: 0.7713284776335373 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.78947368 0.89473684 0.94736842 1. 0.89473684 0.88888889 1. 0.72222222 0.94444444 0.83333333] mean value: 0.8915204678362573 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.81818182 0.91666667 0.95652174 1. 0.92307692 0.90909091 1. 0.8 0.95652174 0.86956522] mean value: 0.9149625012668491 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.81818182 0.91666667 1. 1. 0.85714286 0.90909091 1. 0.71428571 0.91666667 0.83333333] mean value: 0.8965367965367965 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.91666667 0.91666667 1. 1. 0.90909091 1. 0.90909091 1. 0.90909091] mean value: 0.9378787878787879 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.78409091 0.88690476 0.95833333 1. 0.85714286 0.88311688 1. 0.66883117 0.92857143 0.81168831] mean value: 0.8778679653679654 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.69230769 0.84615385 0.91666667 1. 0.85714286 0.83333333 1. 0.66666667 0.91666667 0.76923077] mean value: 0.8498168498168498 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.12 Accuracy on Blind test: 0.54 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.01851869 0.01976252 0.01977897 0.02018905 0.01982856 0.02080202 0.02046824 0.02414322 0.02400231 0.02459073] mean value: 0.021208429336547853 key: score_time value: [0.0122931 0.01221442 0.01403546 0.01435971 0.01452565 0.01226997 0.01526237 0.01487613 0.01891351 0.02681375] mean value: 0.015556406974792481 key: test_mcc value: [-0.20100756 0.18531233 -0.01163105 0.09356015 0.09356015 -0.1934765 0.2987013 0.3040345 -0.24029619 0.2987013 ] mean value: 0.0627458419060211 key: train_mcc value: [1. 1. 0.97474109 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9974741089883715 key: test_accuracy value: [0.52631579 0.63157895 0.47368421 0.63157895 0.63157895 0.55555556 0.66666667 0.66666667 0.44444444 0.66666667] mean value: 0.5894736842105263 key: train_accuracy value: [1. 1. 0.98795181 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9987951807228915 key: test_fscore value: [0.68965517 0.72 0.5 0.75862069 0.75862069 0.71428571 0.72727273 0.78571429 0.58333333 0.72727273] mean value: 0.6964775339602925 key: train_fscore value: [1. 1. 0.99029126 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9990291262135922 key: test_precision value: [0.55555556 0.69230769 0.625 0.64705882 0.64705882 0.58823529 0.72727273 0.64705882 0.53846154 0.72727273] mean value: 0.6395282005576124 key: train_precision value: [1. 1. 0.98076923 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9980769230769231 key: test_recall value: [0.90909091 0.75 0.41666667 0.91666667 0.91666667 0.90909091 0.72727273 1. 0.63636364 0.72727273] mean value: 0.7909090909090909 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.45454545 0.58928571 0.49404762 0.5297619 0.5297619 0.45454545 0.64935065 0.57142857 0.38961039 0.64935065] mean value: 0.5311688311688312 key: train_roc_auc value: [1. 1. 0.984375 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9984375 key: test_jcc value: [0.52631579 0.5625 0.33333333 0.61111111 0.61111111 0.55555556 0.57142857 0.64705882 0.41176471 0.57142857] mean value: 0.5401607572853703 key: train_jcc value: [1. 1. 0.98076923 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9980769230769231 MCC on Blind test: 0.27 Accuracy on Blind test: 0.62 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.03033352 0.0397439 0.03561759 0.03916883 0.03715229 0.03341866 0.03735018 0.05201197 0.05242395 0.05699086] mean value: 0.04142117500305176 key: score_time value: [0.02387834 0.0204978 0.02155447 0.02060032 0.02355218 0.02056217 0.02346349 0.02178597 0.02437901 0.02108812] mean value: 0.02213618755340576 key: test_mcc value: [0.10863102 0.42004128 0.67460105 0.77380952 0.77380952 0.76623377 0.56407607 0.40291148 0.44320263 0.53246753] mean value: 0.5459783889001629 key: train_mcc value: [0.92308458 0.93744159 0.89919089 0.91088941 0.89798254 0.91188694 0.92430455 0.92539974 0.94997541 0.91188694] mean value: 0.919204259351807 key: test_accuracy value: [0.57894737 0.73684211 0.84210526 0.89473684 0.89473684 0.88888889 0.72222222 0.72222222 0.72222222 0.77777778] mean value: 0.7780701754385965 key: train_accuracy value: [0.96385542 0.96987952 0.95180723 0.95783133 0.95180723 0.95808383 0.96407186 0.96407186 0.9760479 0.95808383] mean value: 0.9615540004328692 key: test_fscore value: [0.66666667 0.8 0.86956522 0.91666667 0.91666667 0.90909091 0.70588235 0.7826087 0.81481481 0.81818182] mean value: 0.8200143808072197 key: train_fscore value: [0.97115385 0.97607656 0.96190476 0.96618357 0.96116505 0.96682464 0.97142857 0.97169811 0.98095238 0.96682464] mean value: 0.9694212141193473 key: test_precision value: [0.61538462 0.76923077 0.90909091 0.91666667 0.91666667 0.90909091 1. 0.75 0.6875 0.81818182] mean value: 0.8291812354312355 key: train_precision value: [0.96190476 0.95327103 0.93518519 0.95238095 0.95192308 0.94444444 0.95327103 0.94495413 0.96261682 0.94444444] mean value: 0.9504395872227905 key: test_recall value: [0.72727273 0.83333333 0.83333333 0.91666667 0.91666667 0.90909091 0.54545455 0.81818182 1. 0.81818182] mean value: 0.8318181818181818 key: train_recall value: [0.98058252 1. 0.99019608 0.98039216 0.97058824 0.99029126 0.99029126 1. 1. 0.99029126] mean value: 0.9892632781267847 key: test_roc_auc value: [0.55113636 0.70238095 0.8452381 0.88690476 0.88690476 0.88311688 0.77272727 0.69480519 0.64285714 0.76623377] mean value: 0.7632305194805196 key: train_roc_auc value: [0.95854523 0.9609375 0.94041054 0.95113358 0.94623162 0.94827063 0.95608313 0.953125 0.96875 0.94827063] mean value: 0.9531757858887892 key: test_jcc value: [0.5 0.66666667 0.76923077 0.84615385 0.84615385 0.83333333 0.54545455 0.64285714 0.6875 0.69230769] mean value: 0.7029657842157843 key: train_jcc value: [0.94392523 0.95327103 0.9266055 0.93457944 0.92523364 0.93577982 0.94444444 0.94495413 0.96261682 0.93577982] mean value: 0.940718987872379 MCC on Blind test: 0.21 Accuracy on Blind test: 0.6 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.37415504 0.30020595 0.36566615 0.33783174 0.32791495 0.40239978 0.32688546 0.32808876 0.33113956 0.33140039] mean value: 0.34256877899169924 key: score_time value: [0.02503872 0.02488184 0.02039814 0.02254295 0.01626158 0.02357626 0.023417 0.02344203 0.02036166 0.02339745] mean value: 0.022331762313842773 key: test_mcc value: [0.10863102 0.54761905 0.67460105 0.77380952 0.77380952 0.76623377 0.56407607 0.40291148 0.39594419 0.71350607] mean value: 0.5721141750028133 key: train_mcc value: [0.92308458 0.92403878 0.89919089 0.91088941 0.89798254 0.91188694 0.92430455 0.92539974 0.94933931 0.94933931] mean value: 0.9215456054544839 key: test_accuracy value: [0.57894737 0.78947368 0.84210526 0.89473684 0.89473684 0.88888889 0.72222222 0.72222222 0.72222222 0.83333333] mean value: 0.7888888888888889 key: train_accuracy value: [0.96385542 0.96385542 0.95180723 0.95783133 0.95180723 0.95808383 0.96407186 0.96407186 0.9760479 0.9760479 ] mean value: 0.9627479979799437 key: test_fscore value: [0.66666667 0.83333333 0.86956522 0.91666667 0.91666667 0.90909091 0.70588235 0.7826087 0.8 0.84210526] mean value: 0.8242585771566792 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:114: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:117: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.97115385 0.97115385 0.96190476 0.96618357 0.96116505 0.96682464 0.97142857 0.97169811 0.98076923 0.98076923] mean value: 0.9703050868359714 key: test_precision value: [0.61538462 0.83333333 0.90909091 0.91666667 0.91666667 0.90909091 1. 0.75 0.71428571 1. ] mean value: 0.8564518814518814 key: train_precision value: [0.96190476 0.95283019 0.93518519 0.95238095 0.95192308 0.94444444 0.95327103 0.94495413 0.97142857 0.97142857] mean value: 0.9539750908852559 key: test_recall value: [0.72727273 0.83333333 0.83333333 0.91666667 0.91666667 0.90909091 0.54545455 0.81818182 0.90909091 0.72727273] mean value: 0.8136363636363636 key: train_recall value: [0.98058252 0.99019608 0.99019608 0.98039216 0.97058824 0.99029126 0.99029126 1. 0.99029126 0.99029126] mean value: 0.9873120121835142 key: test_roc_auc value: [0.55113636 0.77380952 0.8452381 0.88690476 0.88690476 0.88311688 0.77272727 0.69480519 0.66883117 0.86363636] mean value: 0.782711038961039 key: train_roc_auc value: [0.95854523 0.95603554 0.94041054 0.95113358 0.94623162 0.94827063 0.95608313 0.953125 0.97170813 0.97170813] mean value: 0.9553251529171539 key: test_jcc value: [0.5 0.71428571 0.76923077 0.84615385 0.84615385 0.83333333 0.54545455 0.64285714 0.66666667 0.72727273] mean value: 0.7091408591408591 key: train_jcc value: [0.94392523 0.94392523 0.9266055 0.93457944 0.92523364 0.93577982 0.94444444 0.94495413 0.96226415 0.96226415] mean value: 0.942397574727439 MCC on Blind test: 0.14 Accuracy on Blind test: 0.57 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03312874 0.06524324 0.10863423 0.13530827 0.03684163 0.03828144 0.03505182 0.06022382 0.09920526 0.03362274] mean value: 0.06455411911010742 key: score_time value: [0.01303244 0.01533794 0.0123136 0.01206446 0.01194906 0.01522326 0.01591754 0.01242018 0.01871157 0.01189852] mean value: 0.013886857032775878 key: test_mcc value: [0.74047959 0.6992059 0.56818182 0.56490196 0.65151515 0.83971912 0.74047959 0.66414149 0.45454545 0.37796447] mean value: 0.6301134542242922 key: train_mcc value: [0.83418999 0.88292404 0.903143 0.85368872 0.85370265 0.88292404 0.85368872 0.83418999 0.81557242 0.88366175] mean value: 0.8597685325684289 key: test_accuracy value: [0.86956522 0.82608696 0.7826087 0.7826087 0.82608696 0.91304348 0.86956522 0.82608696 0.72727273 0.68181818] mean value: 0.8104743083003952 key: train_accuracy value: [0.91707317 0.94146341 0.95121951 0.92682927 0.92682927 0.94146341 0.92682927 0.91707317 0.90776699 0.94174757] mean value: 0.9298295050911675 key: test_fscore value: [0.85714286 0.84615385 0.7826087 0.76190476 0.83333333 0.90909091 0.88 0.81818182 0.72727273 0.63157895] mean value: 0.8047267896100848 key: train_fscore value: [0.91707317 0.94174757 0.95049505 0.92753623 0.92682927 0.94117647 0.92610837 0.91707317 0.90731707 0.94230769] mean value: 0.9297664074411536 key: test_precision value: [0.9 0.73333333 0.75 0.8 0.83333333 1. 0.84615385 0.9 0.72727273 0.75 ] mean value: 0.824009324009324 key: train_precision value: [0.92156863 0.94174757 0.96969697 0.92307692 0.9223301 0.94117647 0.93069307 0.91262136 0.91176471 0.93333333] mean value: 0.9308009128461939 key: test_recall value: [0.81818182 1. 0.81818182 0.72727273 0.83333333 0.83333333 0.91666667 0.75 0.72727273 0.54545455] mean value: 0.796969696969697 key: train_recall value: [0.91262136 0.94174757 0.93203883 0.93203883 0.93137255 0.94117647 0.92156863 0.92156863 0.90291262 0.95145631] mean value: 0.9288501808490387 key: test_roc_auc value: [0.86742424 0.83333333 0.78409091 0.78030303 0.82575758 0.91666667 0.86742424 0.82954545 0.72727273 0.68181818] mean value: 0.8113636363636364 key: train_roc_auc value: [0.91709499 0.94146202 0.95131354 0.92680373 0.92685132 0.94146202 0.92680373 0.91709499 0.90776699 0.94174757] mean value: 0.9298400913763564 key: test_jcc value: [0.75 0.73333333 0.64285714 0.61538462 0.71428571 0.83333333 0.78571429 0.69230769 0.57142857 0.46153846] mean value: 0.680018315018315 key: train_jcc value: [0.84684685 0.88990826 0.90566038 0.86486486 0.86363636 0.88888889 0.86238532 0.84684685 0.83035714 0.89090909] mean value: 0.8690304000190187 MCC on Blind test: 0.33 Accuracy on Blind test: 0.67 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.82665825 0.93447065 0.77925324 0.78353238 0.88928652 0.84824395 0.78165197 0.86581111 0.82213378 0.89065075] mean value: 0.8421692609786987 key: score_time value: [0.01458549 0.01205873 0.01185203 0.01503658 0.0150156 0.01501513 0.0152204 0.01511884 0.01503301 0.01509953] mean value: 0.014403533935546876 key: test_mcc value: [0.65909298 0.63327851 0.56818182 0.65151515 0.74242424 0.91666667 0.74047959 0.65151515 0.63636364 0.46225016] mean value: 0.6661767909021908 key: train_mcc value: [1. 0.99029126 0.95126594 1. 0.95126594 0.92194936 1. 1. 1. 0.9223301 ] mean value: 0.9737102608033504 key: test_accuracy value: [0.82608696 0.7826087 0.7826087 0.82608696 0.86956522 0.95652174 0.86956522 0.82608696 0.81818182 0.72727273] mean value: 0.8284584980237154 key: train_accuracy value: [1. 0.99512195 0.97560976 1. 0.97560976 0.96097561 1. 1. 1. 0.96116505] mean value: 0.9868482121714421 key: test_fscore value: [0.8 0.81481481 0.7826087 0.81818182 0.86956522 0.95652174 0.88 0.83333333 0.81818182 0.7 ] mean value: 0.8273207436685698 key: train_fscore value: [1. 0.99512195 0.97560976 1. 0.97560976 0.96078431 1. 1. 1. 0.96116505] mean value: 0.9868290825683814 key: test_precision value: [0.88888889 0.6875 0.75 0.81818182 0.90909091 1. 0.84615385 0.83333333 0.81818182 0.77777778] mean value: 0.8329108391608392 key: train_precision value: [1. 1. 0.98039216 1. 0.97087379 0.96078431 1. 1. 1. 0.96116505] mean value: 0.9873215305539692 key: test_recall value: [0.72727273 1. 0.81818182 0.81818182 0.83333333 0.91666667 0.91666667 0.83333333 0.81818182 0.63636364] mean value: 0.8318181818181818 key: train_recall value: [1. 0.99029126 0.97087379 1. 0.98039216 0.96078431 1. 1. 1. 0.96116505] mean value: 0.9863506567675614 key: test_roc_auc value: [0.8219697 0.79166667 0.78409091 0.82575758 0.87121212 0.95833333 0.86742424 0.82575758 0.81818182 0.72727273] mean value: 0.8291666666666666 key: train_roc_auc value: [1. 0.99514563 0.97563297 1. 0.97563297 0.96097468 1. 1. 1. 0.96116505] mean value: 0.9868551304016753 key: test_jcc value: [0.66666667 0.6875 0.64285714 0.69230769 0.76923077 0.91666667 0.78571429 0.71428571 0.69230769 0.53846154] mean value: 0.7105998168498169 key: train_jcc value: [1. 0.99029126 0.95238095 1. 0.95238095 0.9245283 1. 1. 1. 0.92523364] mean value: 0.9744815113644433 MCC on Blind test: 0.29 Accuracy on Blind test: 0.64 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01280308 0.01112795 0.00904179 0.00915575 0.00870848 0.0087254 0.008672 0.00886917 0.00864339 0.0087316 ] mean value: 0.009447860717773437 key: score_time value: [0.01178265 0.00904846 0.0089066 0.00885105 0.00853443 0.00852323 0.00856495 0.00855088 0.00860286 0.00863481] mean value: 0.008999991416931152 key: test_mcc value: [0.41096386 0.44411739 0.38932432 0.15096491 0.38932432 0.3030303 0.47727273 0.30240737 0.09245003 0.54772256] mean value: 0.35075777936506775 key: train_mcc value: [0.4448612 0.44400007 0.46806514 0.53843728 0.47567594 0.45607916 0.45709726 0.49637007 0.42964161 0.50892419] mean value: 0.4719151927110299 key: test_accuracy value: [0.69565217 0.69565217 0.69565217 0.56521739 0.69565217 0.65217391 0.73913043 0.65217391 0.54545455 0.77272727] mean value: 0.6709486166007905 key: train_accuracy value: [0.70243902 0.72195122 0.73170732 0.75609756 0.73658537 0.72682927 0.72682927 0.74634146 0.71359223 0.75242718] mean value: 0.7314799905280606 key: test_fscore value: [0.72 0.74074074 0.66666667 0.61538462 0.72 0.66666667 0.75 0.69230769 0.58333333 0.76190476] mean value: 0.6917004477004477 key: train_fscore value: [0.75502008 0.72727273 0.75113122 0.78991597 0.74766355 0.73831776 0.74074074 0.75925926 0.6974359 0.76712329] mean value: 0.747388048921837 key: test_precision value: [0.64285714 0.625 0.7 0.53333333 0.69230769 0.66666667 0.75 0.64285714 0.53846154 0.8 ] mean value: 0.6591483516483516 key: train_precision value: [0.64383562 0.71698113 0.70338983 0.6962963 0.71428571 0.70535714 0.70175439 0.71929825 0.73913043 0.72413793] mean value: 0.7064466729857495 key: test_recall value: [0.81818182 0.90909091 0.63636364 0.72727273 0.75 0.66666667 0.75 0.75 0.63636364 0.72727273] mean value: 0.7371212121212121 key: train_recall value: [0.91262136 0.73786408 0.80582524 0.91262136 0.78431373 0.7745098 0.78431373 0.80392157 0.66019417 0.81553398] mean value: 0.7991719017704169 key: test_roc_auc value: [0.70075758 0.70454545 0.69318182 0.5719697 0.69318182 0.65151515 0.73863636 0.64772727 0.54545455 0.77272727] mean value: 0.6719696969696969 key: train_roc_auc value: [0.70140872 0.72187322 0.73134399 0.75533029 0.73681706 0.72706073 0.72710832 0.74662098 0.71359223 0.75242718] mean value: 0.7313582714639254 key: test_jcc value: [0.5625 0.58823529 0.5 0.44444444 0.5625 0.5 0.6 0.52941176 0.41176471 0.61538462] mean value: 0.5314240824534943 key: train_jcc value: [0.60645161 0.57142857 0.60144928 0.65277778 0.59701493 0.58518519 0.58823529 0.6119403 0.53543307 0.62222222] mean value: 0.5972138233743687 MCC on Blind test: 0.45 Accuracy on Blind test: 0.71 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00916004 0.00893545 0.0089612 0.00898767 0.00895739 0.00898981 0.00910544 0.00896788 0.00898862 0.00901413] mean value: 0.009006762504577636 key: score_time value: [0.00864172 0.00866985 0.00862956 0.00853229 0.00861835 0.00858736 0.00872016 0.00858521 0.00866818 0.00860548] mean value: 0.00862581729888916 key: test_mcc value: [0.65909298 0.21452908 0.12336594 0.21452908 0.08257228 0.44411739 0.08257228 0.23262105 0.32539569 0.23570226] mean value: 0.26144980489209724 key: train_mcc value: [0.431714 0.44379575 0.47690661 0.38794503 0.41929975 0.43858746 0.45614118 0.4454215 0.40723148 0.39531893] mean value: 0.430236169300088 key: test_accuracy value: [0.82608696 0.60869565 0.56521739 0.60869565 0.52173913 0.69565217 0.52173913 0.60869565 0.63636364 0.59090909] mean value: 0.6183794466403162 key: train_accuracy value: [0.70243902 0.70731707 0.72195122 0.67804878 0.69268293 0.70731707 0.71707317 0.70731707 0.69417476 0.68446602] mean value: 0.7012787118162443 key: test_fscore value: [0.8 0.52631579 0.44444444 0.52631579 0.35294118 0.63157895 0.35294118 0.57142857 0.5 0.4 ] mean value: 0.5105965895129981 key: train_fscore value: [0.64327485 0.64705882 0.66272189 0.60240964 0.61349693 0.64705882 0.6627907 0.63855422 0.64 0.61538462] mean value: 0.6372750495347176 key: test_precision value: [0.88888889 0.625 0.57142857 0.625 0.6 0.85714286 0.6 0.66666667 0.8 0.75 ] mean value: 0.6984126984126984 key: train_precision value: [0.80882353 0.82089552 0.84848485 0.79365079 0.81967213 0.80882353 0.81428571 0.828125 0.77777778 0.78787879] mean value: 0.8108417634437052 key: test_recall value: [0.72727273 0.45454545 0.36363636 0.45454545 0.25 0.5 0.25 0.5 0.36363636 0.27272727] mean value: 0.41363636363636364 key: train_recall value: [0.53398058 0.53398058 0.54368932 0.48543689 0.49019608 0.53921569 0.55882353 0.51960784 0.54368932 0.50485437] mean value: 0.5253474205216067 key: test_roc_auc value: [0.8219697 0.60227273 0.55681818 0.60227273 0.53409091 0.70454545 0.53409091 0.61363636 0.63636364 0.59090909] mean value: 0.6196969696969696 key: train_roc_auc value: [0.7032648 0.70816676 0.72282505 0.67899296 0.69169998 0.70650105 0.71630497 0.70640586 0.69417476 0.68446602] mean value: 0.7012802208261946 key: test_jcc value: [0.66666667 0.35714286 0.28571429 0.35714286 0.21428571 0.46153846 0.21428571 0.4 0.33333333 0.25 ] mean value: 0.354010989010989 key: train_jcc value: [0.47413793 0.47826087 0.49557522 0.43103448 0.44247788 0.47826087 0.49565217 0.46902655 0.47058824 0.44444444] mean value: 0.4679458652592843 MCC on Blind test: 0.19 Accuracy on Blind test: 0.59 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00874019 0.00965261 0.0093236 0.00860143 0.00863314 0.00863075 0.00869918 0.00897694 0.00959516 0.009655 ] mean value: 0.009050798416137696 key: score_time value: [0.01461792 0.01053357 0.01011968 0.00984621 0.00996614 0.00994396 0.01022553 0.01024222 0.01070547 0.01071763] mean value: 0.01069183349609375 key: test_mcc value: [0.12878788 0.31298622 0.02585438 0.12406456 0.44411739 0.2096648 0.25495628 0.3030303 0.23570226 0. ] mean value: 0.20391640867334052 key: train_mcc value: [0.53446628 0.51172946 0.55610418 0.52267493 0.48193786 0.50002007 0.46832513 0.45886299 0.51700551 0.53764186] mean value: 0.5088768274038467 key: test_accuracy value: [0.56521739 0.65217391 0.52173913 0.56521739 0.69565217 0.56521739 0.60869565 0.65217391 0.59090909 0.5 ] mean value: 0.591699604743083 key: train_accuracy value: [0.76585366 0.75121951 0.77073171 0.75609756 0.73658537 0.74634146 0.72682927 0.72682927 0.75728155 0.76699029] mean value: 0.7504759649538243 key: test_fscore value: [0.54545455 0.66666667 0.35294118 0.5 0.63157895 0.375 0.52631579 0.66666667 0.4 0.42105263] mean value: 0.5085676423679519 key: train_fscore value: [0.75510204 0.72727273 0.7431694 0.7311828 0.70652174 0.72043011 0.68539326 0.70212766 0.74489796 0.75257732] mean value: 0.7268675006125136 key: test_precision value: [0.54545455 0.61538462 0.5 0.55555556 0.85714286 0.75 0.71428571 0.66666667 0.75 0.5 ] mean value: 0.6454489954489955 key: train_precision value: [0.79569892 0.80952381 0.85 0.81927711 0.79268293 0.79761905 0.80263158 0.76744186 0.78494624 0.8021978 ] mean value: 0.802201929530647 key: test_recall value: [0.54545455 0.72727273 0.27272727 0.45454545 0.5 0.25 0.41666667 0.66666667 0.27272727 0.36363636] mean value: 0.44696969696969696 key: train_recall value: [0.7184466 0.66019417 0.66019417 0.66019417 0.6372549 0.65686275 0.59803922 0.64705882 0.70873786 0.70873786] mean value: 0.6655720540643442 key: test_roc_auc value: [0.56439394 0.65530303 0.51136364 0.56060606 0.70454545 0.57954545 0.61742424 0.65151515 0.59090909 0.5 ] mean value: 0.593560606060606 key: train_roc_auc value: [0.76608605 0.75166571 0.77127356 0.75656768 0.73610318 0.7459071 0.72620407 0.72644203 0.75728155 0.76699029] mean value: 0.7504521225966115 key: test_jcc value: [0.375 0.5 0.21428571 0.33333333 0.46153846 0.23076923 0.35714286 0.5 0.25 0.26666667] mean value: 0.3488736263736264 key: train_jcc value: [0.60655738 0.57142857 0.59130435 0.57627119 0.54621849 0.56302521 0.52136752 0.54098361 0.59349593 0.60330579] mean value: 0.5713958028231724 MCC on Blind test: 0.13 Accuracy on Blind test: 0.56 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.0141499 0.01209903 0.01171756 0.01287889 0.01204228 0.01184416 0.01186538 0.01188636 0.01186728 0.01376891] mean value: 0.012411975860595703 key: score_time value: [0.01020265 0.00940251 0.00930929 0.01236463 0.00963902 0.00941229 0.00974226 0.00945449 0.00942039 0.01031494] mean value: 0.009926247596740722 key: test_mcc value: [0.58002308 0.6992059 0.38932432 0.21374669 0.5164589 0.58930667 0.58930667 0.39393939 0.2773501 0.2773501 ] mean value: 0.4526011806094105 key: train_mcc value: [0.68838106 0.71237056 0.72307355 0.72506339 0.70305132 0.70305132 0.66368352 0.72693519 0.72423827 0.74069712] mean value: 0.7110545294044077 key: test_accuracy value: [0.7826087 0.82608696 0.69565217 0.60869565 0.73913043 0.7826087 0.7826087 0.69565217 0.63636364 0.63636364] mean value: 0.7185770750988142 key: train_accuracy value: [0.84390244 0.85365854 0.85853659 0.85853659 0.84878049 0.84878049 0.82926829 0.86341463 0.8592233 0.86893204] mean value: 0.8533033388586313 key: test_fscore value: [0.73684211 0.84615385 0.66666667 0.57142857 0.7 0.76190476 0.76190476 0.69565217 0.6 0.6 ] mean value: 0.6940552887234809 key: train_fscore value: [0.84158416 0.84536082 0.84974093 0.84816754 0.83769634 0.83769634 0.81675393 0.86138614 0.84974093 0.86294416] mean value: 0.8451071285619147 key: test_precision value: [0.875 0.73333333 0.7 0.6 0.875 0.88888889 0.88888889 0.72727273 0.66666667 0.66666667] mean value: 0.7621717171717172 key: train_precision value: [0.85858586 0.9010989 0.91111111 0.92045455 0.8988764 0.8988764 0.87640449 0.87 0.91111111 0.90425532] mean value: 0.895077414988125 key: test_recall value: [0.63636364 1. 0.63636364 0.54545455 0.58333333 0.66666667 0.66666667 0.66666667 0.54545455 0.54545455] mean value: 0.6492424242424242 key: train_recall value: [0.82524272 0.7961165 0.7961165 0.78640777 0.78431373 0.78431373 0.76470588 0.85294118 0.7961165 0.82524272] mean value: 0.8011517228250523 key: test_roc_auc value: [0.77651515 0.83333333 0.69318182 0.60606061 0.74621212 0.78787879 0.78787879 0.6969697 0.63636364 0.63636364] mean value: 0.7200757575757576 key: train_roc_auc value: [0.84399391 0.85394061 0.85884257 0.85889016 0.84846754 0.84846754 0.82895488 0.86336379 0.8592233 0.86893204] mean value: 0.8533076337331049 key: test_jcc value: [0.58333333 0.73333333 0.5 0.4 0.53846154 0.61538462 0.61538462 0.53333333 0.42857143 0.42857143] mean value: 0.5376373626373626 key: train_jcc value: [0.72649573 0.73214286 0.73873874 0.73636364 0.72072072 0.72072072 0.69026549 0.75652174 0.73873874 0.75892857] mean value: 0.7319636936205809 MCC on Blind test: 0.33 Accuracy on Blind test: 0.67 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.5654161 0.70051932 0.43851113 0.8557241 0.81149411 0.945997 0.29760575 0.19453859 0.41354036 0.27692008] mean value: 0.5500266551971436 key: score_time value: [0.01253867 0.01216602 0.01216125 0.01217246 0.01259136 0.01223016 0.01214933 0.01222849 0.01218319 0.02718925] mean value: 0.013761019706726075 key: test_mcc value: [0.65151515 0.63327851 0.38932432 0.39727608 0.56818182 0.74242424 0.63327851 0.12406456 0.47140452 0.23570226] mean value: 0.48464499668963007 key: train_mcc value: [0.64278523 0.8360404 0.75277897 0.79983884 0.89371934 0.87321531 0.54046344 0.58230118 0.58157543 0.48196269] mean value: 0.6984680827686014 key: test_accuracy value: [0.82608696 0.7826087 0.69565217 0.69565217 0.7826087 0.86956522 0.7826087 0.56521739 0.68181818 0.59090909] mean value: 0.7272727272727273 key: train_accuracy value: [0.8 0.91707317 0.86829268 0.89756098 0.94634146 0.93658537 0.76097561 0.78536585 0.75728155 0.69417476] mean value: 0.8363651432630831 key: test_fscore value: [0.81818182 0.81481481 0.66666667 0.63157895 0.7826087 0.86956522 0.73684211 0.61538462 0.75862069 0.68965517] mean value: 0.7383918742791937 key: train_fscore value: [0.83127572 0.92018779 0.85405405 0.89230769 0.94472362 0.93658537 0.72316384 0.80357143 0.80314961 0.76404494] mean value: 0.8473064064396472 key: test_precision value: [0.81818182 0.6875 0.7 0.75 0.81818182 0.90909091 1. 0.57142857 0.61111111 0.55555556] mean value: 0.7421049783549784 key: train_precision value: [0.72142857 0.89090909 0.96341463 0.94565217 0.96907216 0.93203883 0.85333333 0.73770492 0.67549669 0.62195122] mean value: 0.8311001629916994 key: test_recall value: [0.81818182 1. 0.63636364 0.54545455 0.75 0.83333333 0.58333333 0.66666667 1. 0.90909091] mean value: 0.7742424242424243 key: train_recall value: [0.98058252 0.95145631 0.76699029 0.84466019 0.92156863 0.94117647 0.62745098 0.88235294 0.99029126 0.99029126] mean value: 0.8896820864268037 key: test_roc_auc value: [0.82575758 0.79166667 0.69318182 0.68939394 0.78409091 0.87121212 0.79166667 0.56060606 0.68181818 0.59090909] mean value: 0.728030303030303 key: train_roc_auc value: [0.79911479 0.91690463 0.86878926 0.89782029 0.94622121 0.93660765 0.76032743 0.78583666 0.75728155 0.69417476] mean value: 0.836307824100514 key: test_jcc value: [0.69230769 0.6875 0.5 0.46153846 0.64285714 0.76923077 0.58333333 0.44444444 0.61111111 0.52631579] mean value: 0.5918638744296639 key: train_jcc value: [0.71126761 0.85217391 0.74528302 0.80555556 0.8952381 0.88073394 0.56637168 0.67164179 0.67105263 0.61818182] mean value: 0.7417500055514455 MCC on Blind test: 0.26 Accuracy on Blind test: 0.63 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01879621 0.01864338 0.01436448 0.01555657 0.01544523 0.01569748 0.01556516 0.0156126 0.01505065 0.01566124] mean value: 0.016039299964904784 key: score_time value: [0.01181293 0.00982785 0.00936031 0.00934577 0.00941539 0.00954008 0.00944448 0.00944018 0.00936866 0.00942349] mean value: 0.009697914123535156 key: test_mcc value: [0.76277007 0.41096386 0.48856385 1. 0.76764947 0.83971912 0.83743579 0.91605722 0.91287093 0.91287093] mean value: 0.7848901253107335 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 0.69565217 0.73913043 1. 0.86956522 0.91304348 0.91304348 0.95652174 0.95454545 0.95454545] mean value: 0.8865612648221344 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.84210526 0.72 0.75 1. 0.85714286 0.90909091 0.92307692 0.96 0.95652174 0.95238095] mean value: 0.8870318643979971 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.64285714 0.69230769 1. 1. 1. 0.85714286 0.92307692 0.91666667 1. ] mean value: 0.9032051282051282 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 0.81818182 0.81818182 1. 0.75 0.83333333 1. 1. 1. 0.90909091] mean value: 0.8856060606060606 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86363636 0.70075758 0.74242424 1. 0.875 0.91666667 0.90909091 0.95454545 0.95454545 0.95454545] mean value: 0.8871212121212121 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.72727273 0.5625 0.6 1. 0.75 0.83333333 0.85714286 0.92307692 0.91666667 0.90909091] mean value: 0.8079083416583417 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.14 Accuracy on Blind test: 0.56 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.10700893 0.10751104 0.10628223 0.1066637 0.10690093 0.10731983 0.1071713 0.10743928 0.1068995 0.10708857] mean value: 0.10702853202819824 key: score_time value: [0.01878786 0.01899338 0.01907802 0.0190022 0.01901197 0.0189383 0.01898313 0.01904893 0.0191102 0.019032 ] mean value: 0.018998599052429198 key: test_mcc value: [0.66414149 0.6992059 0.48856385 0.39727608 0.41096386 0.65151515 0.91605722 0.58002308 0.46225016 0.54772256] mean value: 0.5817719350962288 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.82608696 0.82608696 0.73913043 0.69565217 0.69565217 0.82608696 0.95652174 0.7826087 0.72727273 0.77272727] mean value: 0.7847826086956522 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.83333333 0.84615385 0.75 0.63157895 0.66666667 0.83333333 0.96 0.81481481 0.7 0.76190476] mean value: 0.7797785703575177 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.76923077 0.73333333 0.69230769 0.75 0.77777778 0.83333333 0.92307692 0.73333333 0.77777778 0.8 ] mean value: 0.7790170940170941 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 1. 0.81818182 0.54545455 0.58333333 0.83333333 1. 0.91666667 0.63636364 0.72727273] mean value: 0.796969696969697 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.82954545 0.83333333 0.74242424 0.68939394 0.70075758 0.82575758 0.95454545 0.77651515 0.72727273 0.77272727] mean value: 0.7852272727272727 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.71428571 0.73333333 0.6 0.46153846 0.5 0.71428571 0.92307692 0.6875 0.53846154 0.61538462] mean value: 0.6487866300366301 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.33 Accuracy on Blind test: 0.66 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01075053 0.01030135 0.00908804 0.00905085 0.00904369 0.01010799 0.0100894 0.0101018 0.00930357 0.00907087] mean value: 0.009690809249877929 key: score_time value: [0.01013565 0.00923514 0.00874352 0.00862527 0.00860524 0.00943875 0.00945067 0.00937891 0.00866175 0.00867438] mean value: 0.009094929695129395 key: test_mcc value: [ 0.47727273 0.48856385 -0.04545455 0.48075018 0.44411739 -0.03816905 0.13740858 0.21374669 -0.09759001 -0.18257419] mean value: 0.18780716298837632 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73913043 0.73913043 0.47826087 0.73913043 0.69565217 0.47826087 0.56521739 0.60869565 0.45454545 0.40909091] mean value: 0.5907114624505929 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.72727273 0.75 0.45454545 0.7 0.63157895 0.45454545 0.54545455 0.64 0.33333333 0.43478261] mean value: 0.5671513071215588 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.72727273 0.69230769 0.45454545 0.77777778 0.85714286 0.5 0.6 0.61538462 0.42857143 0.41666667] mean value: 0.6069669219669219 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 0.81818182 0.45454545 0.63636364 0.5 0.41666667 0.5 0.66666667 0.27272727 0.45454545] mean value: 0.5446969696969697 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.73863636 0.74242424 0.47727273 0.73484848 0.70454545 0.48106061 0.56818182 0.60606061 0.45454545 0.40909091] mean value: 0.5916666666666667 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.57142857 0.6 0.29411765 0.53846154 0.46153846 0.29411765 0.375 0.47058824 0.2 0.27777778] mean value: 0.4083029878618114 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.15 Accuracy on Blind test: 0.57 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.32547903 1.41488957 1.40615559 1.33278108 1.30856895 1.30703354 1.31076121 1.31396508 1.30783343 1.31076026] mean value: 1.333822774887085 key: score_time value: [0.15590978 0.09691024 0.09611034 0.08798575 0.09535575 0.09329295 0.08852673 0.09049702 0.09529018 0.08904791] mean value: 0.0988926649093628 key: test_mcc value: [0.74047959 0.63327851 0.39393939 0.65151515 0.74242424 0.83971912 0.82575758 0.65909298 0.63636364 0.73029674] mean value: 0.6852866944934071 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 0.7826087 0.69565217 0.82608696 0.86956522 0.91304348 0.91304348 0.82608696 0.81818182 0.86363636] mean value: 0.8377470355731225 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 0.81481481 0.69565217 0.81818182 0.86956522 0.90909091 0.91666667 0.84615385 0.81818182 0.85714286] mean value: 0.8402592978679935 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.9 0.6875 0.66666667 0.81818182 0.90909091 1. 0.91666667 0.78571429 0.81818182 0.9 ] mean value: 0.8402002164502165 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 1. 0.72727273 0.81818182 0.83333333 0.83333333 0.91666667 0.91666667 0.81818182 0.81818182] mean value: 0.85 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86742424 0.79166667 0.6969697 0.82575758 0.87121212 0.91666667 0.91287879 0.8219697 0.81818182 0.86363636] mean value: 0.8386363636363636 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 0.6875 0.53333333 0.69230769 0.76923077 0.83333333 0.84615385 0.73333333 0.69230769 0.75 ] mean value: 0.72875 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.25 Accuracy on Blind test: 0.61 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.8421793 0.98309755 0.89729023 0.8816216 0.92900515 0.9211638 0.88418436 0.92679811 0.93264508 0.93030334] mean value: 0.9128288507461548 key: score_time value: [0.2220726 0.20171928 0.21628141 0.22029281 0.17704153 0.23349524 0.24436331 0.23651242 0.19798326 0.22944474] mean value: 0.21792066097259521 key: test_mcc value: [0.65151515 0.63327851 0.48856385 0.65151515 0.56490196 0.83971912 0.74047959 0.74047959 0.63636364 0.73029674] mean value: 0.6677113300585276 key: train_mcc value: [0.97077583 0.97077583 0.98067223 0.9516192 0.96116136 0.96116136 0.95163291 0.9707786 0.95186015 0.97091955] mean value: 0.9641356995103791 key: test_accuracy value: [0.82608696 0.7826087 0.73913043 0.82608696 0.7826087 0.91304348 0.86956522 0.86956522 0.81818182 0.86363636] mean value: 0.8290513833992095 key: train_accuracy value: [0.98536585 0.98536585 0.9902439 0.97560976 0.9804878 0.9804878 0.97560976 0.98536585 0.97572816 0.98543689] mean value: 0.9819701633909543 key: test_fscore value: [0.81818182 0.81481481 0.75 0.81818182 0.8 0.90909091 0.88 0.88 0.81818182 0.85714286] mean value: 0.8345594035594036 key: train_fscore value: [0.98550725 0.98550725 0.99038462 0.97607656 0.98058252 0.98058252 0.97584541 0.98536585 0.97607656 0.98550725] mean value: 0.9821435777393142 key: test_precision value: [0.81818182 0.6875 0.69230769 0.81818182 0.76923077 1. 0.84615385 0.84615385 0.81818182 0.9 ] mean value: 0.8195891608391609 key: train_precision value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [0.98076923 0.98076923 0.98095238 0.96226415 0.97115385 0.97115385 0.96190476 0.98058252 0.96226415 0.98076923] mean value: 0.9732583353631165 key: test_recall value: [0.81818182 1. 0.81818182 0.81818182 0.83333333 0.83333333 0.91666667 0.91666667 0.81818182 0.81818182] mean value: 0.8590909090909091 key: train_recall value: [0.99029126 0.99029126 1. 0.99029126 0.99019608 0.99019608 0.99019608 0.99019608 0.99029126 0.99029126] mean value: 0.9912240624405102 key: test_roc_auc value: [0.82575758 0.79166667 0.74242424 0.82575758 0.78030303 0.91666667 0.86742424 0.86742424 0.81818182 0.86363636] mean value: 0.8299242424242425 key: train_roc_auc value: [0.98534171 0.98534171 0.99019608 0.97553779 0.98053493 0.98053493 0.97568056 0.9853893 0.97572816 0.98543689] mean value: 0.9819722063582714 key: test_jcc value: [0.69230769 0.6875 0.6 0.69230769 0.66666667 0.83333333 0.78571429 0.78571429 0.69230769 0.75 ] mean value: 0.7185851648351649 key: train_jcc value: [0.97142857 0.97142857 0.98095238 0.95327103 0.96190476 0.96190476 0.95283019 0.97115385 0.95327103 0.97142857] mean value: 0.9649573709955477 MCC on Blind test: 0.28 Accuracy on Blind test: 0.62 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02420473 0.01008201 0.01005197 0.01018906 0.01016712 0.01026821 0.01024914 0.01026487 0.01008368 0.0101018 ] mean value: 0.011566257476806641 key: score_time value: [0.01045871 0.00948524 0.00967598 0.00949144 0.00947976 0.0094893 0.00947714 0.00952911 0.00955606 0.00959349] mean value: 0.00962362289428711 key: test_mcc value: [0.65909298 0.21452908 0.12336594 0.21452908 0.08257228 0.44411739 0.08257228 0.23262105 0.32539569 0.23570226] mean value: 0.26144980489209724 key: train_mcc value: [0.431714 0.44379575 0.47690661 0.38794503 0.41929975 0.43858746 0.45614118 0.4454215 0.40723148 0.39531893] mean value: 0.430236169300088 key: test_accuracy value: [0.82608696 0.60869565 0.56521739 0.60869565 0.52173913 0.69565217 0.52173913 0.60869565 0.63636364 0.59090909] mean value: 0.6183794466403162 key: train_accuracy value: [0.70243902 0.70731707 0.72195122 0.67804878 0.69268293 0.70731707 0.71707317 0.70731707 0.69417476 0.68446602] mean value: 0.7012787118162443 key: test_fscore value: [0.8 0.52631579 0.44444444 0.52631579 0.35294118 0.63157895 0.35294118 0.57142857 0.5 0.4 ] mean value: 0.5105965895129981 key: train_fscore value: [0.64327485 0.64705882 0.66272189 0.60240964 0.61349693 0.64705882 0.6627907 0.63855422 0.64 0.61538462] mean value: 0.6372750495347176 key: test_precision value: [0.88888889 0.625 0.57142857 0.625 0.6 0.85714286 0.6 0.66666667 0.8 0.75 ] mean value: 0.6984126984126984 key: train_precision value: [0.80882353 0.82089552 0.84848485 0.79365079 0.81967213 0.80882353 0.81428571 0.828125 0.77777778 0.78787879] mean value: 0.8108417634437052 key: test_recall value: [0.72727273 0.45454545 0.36363636 0.45454545 0.25 0.5 0.25 0.5 0.36363636 0.27272727] mean value: 0.41363636363636364 key: train_recall value: [0.53398058 0.53398058 0.54368932 0.48543689 0.49019608 0.53921569 0.55882353 0.51960784 0.54368932 0.50485437] mean value: 0.5253474205216067 key: test_roc_auc value: [0.8219697 0.60227273 0.55681818 0.60227273 0.53409091 0.70454545 0.53409091 0.61363636 0.63636364 0.59090909] mean value: 0.6196969696969696 key: train_roc_auc value: [0.7032648 0.70816676 0.72282505 0.67899296 0.69169998 0.70650105 0.71630497 0.70640586 0.69417476 0.68446602] mean value: 0.7012802208261946 key: test_jcc value: [0.66666667 0.35714286 0.28571429 0.35714286 0.21428571 0.46153846 0.21428571 0.4 0.33333333 0.25 ] mean value: 0.354010989010989 key: train_jcc value: [0.47413793 0.47826087 0.49557522 0.43103448 0.44247788 0.47826087 0.49565217 0.46902655 0.47058824 0.44444444] mean value: 0.4679458652592843 MCC on Blind test: 0.19 Accuracy on Blind test: 0.59 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.2580533 0.05313063 0.05868006 0.05744362 0.05369687 0.06253695 0.06104183 0.06061697 0.06067872 0.07018757] mean value: 0.07960665225982666 key: score_time value: [0.01125717 0.01169109 0.01044297 0.01053381 0.01151872 0.0110507 0.01123476 0.01140285 0.01065278 0.01143217] mean value: 0.011121702194213868 key: test_mcc value: [0.58002308 0.58930667 0.66414149 1. 0.74242424 0.83971912 0.83743579 1. 1. 1. ] mean value: 0.8253050384253398 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.7826087 0.7826087 0.82608696 1. 0.86956522 0.91304348 0.91304348 1. 1. 1. ] mean value: 0.908695652173913 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.73684211 0.8 0.83333333 1. 0.86956522 0.90909091 0.92307692 1. 1. 1. ] mean value: 0.9071908488155628 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.875 0.71428571 0.76923077 1. 0.90909091 1. 0.85714286 1. 1. 1. ] mean value: 0.912475024975025 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.63636364 0.90909091 0.90909091 1. 0.83333333 0.83333333 1. 1. 1. 1. ] mean value: 0.9121212121212121 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.77651515 0.78787879 0.82954545 1. 0.87121212 0.91666667 0.90909091 1. 1. 1. ] mean value: 0.9090909090909091 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.58333333 0.66666667 0.71428571 1. 0.76923077 0.83333333 0.85714286 1. 1. 1. ] mean value: 0.8423992673992674 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.05 Accuracy on Blind test: 0.52 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.02735448 0.02864647 0.0365417 0.02909827 0.02844095 0.02818394 0.05325174 0.04603481 0.02615213 0.02718377] mean value: 0.03308882713317871 key: score_time value: [0.01265574 0.0118897 0.01186728 0.01191807 0.01186371 0.01190019 0.02155924 0.01197457 0.01198792 0.0118432 ] mean value: 0.012945961952209473 key: test_mcc value: [0.48075018 0.65909298 0.76764947 0.56490196 0.58930667 0.65909298 0.65151515 0.58930667 0.81818182 0.68313005] mean value: 0.6462927922925813 key: train_mcc value: [0.93174679 0.96116136 0.95126131 0.94146202 0.9707786 0.96116136 0.96116136 0.96097468 0.93208276 0.94192516] mean value: 0.9513715399106522 key: test_accuracy value: [0.73913043 0.82608696 0.86956522 0.7826087 0.7826087 0.82608696 0.82608696 0.7826087 0.90909091 0.81818182] mean value: 0.8162055335968379 key: train_accuracy value: [0.96585366 0.9804878 0.97560976 0.97073171 0.98536585 0.9804878 0.9804878 0.9804878 0.96601942 0.97087379] mean value: 0.9756405399005447 key: test_fscore value: [0.7 0.8 0.88 0.76190476 0.76190476 0.84615385 0.83333333 0.76190476 0.90909091 0.77777778] mean value: 0.8032070152070152 key: train_fscore value: [0.96618357 0.98039216 0.97584541 0.97087379 0.98536585 0.98058252 0.98058252 0.98039216 0.96618357 0.97115385] mean value: 0.9757555408875802 key: test_precision value: [0.77777778 0.88888889 0.78571429 0.8 0.88888889 0.78571429 0.83333333 0.88888889 0.90909091 1. ] mean value: 0.8558297258297258 key: train_precision value: [0.96153846 0.99009901 0.97115385 0.97087379 0.98058252 0.97115385 0.97115385 0.98039216 0.96153846 0.96190476] mean value: 0.972039070088657 key: test_recall value: [0.63636364 0.72727273 1. 0.72727273 0.66666667 0.91666667 0.83333333 0.66666667 0.90909091 0.63636364] mean value: 0.771969696969697 key: train_recall value: [0.97087379 0.97087379 0.98058252 0.97087379 0.99019608 0.99019608 0.99019608 0.98039216 0.97087379 0.98058252] mean value: 0.979564058633162 key: test_roc_auc value: [0.73484848 0.8219697 0.875 0.78030303 0.78787879 0.8219697 0.82575758 0.78787879 0.90909091 0.81818182] mean value: 0.8162878787878787 key: train_roc_auc value: [0.96582905 0.98053493 0.97558538 0.97073101 0.9853893 0.98053493 0.98053493 0.98048734 0.96601942 0.97087379] mean value: 0.9756520083761661 key: test_jcc value: [0.53846154 0.66666667 0.78571429 0.61538462 0.61538462 0.73333333 0.71428571 0.61538462 0.83333333 0.63636364] mean value: 0.6754312354312354 key: train_jcc value: [0.93457944 0.96153846 0.95283019 0.94339623 0.97115385 0.96190476 0.96190476 0.96153846 0.93457944 0.94392523] mean value: 0.9527350820284165 MCC on Blind test: 0.18 Accuracy on Blind test: 0.59 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02102685 0.0097661 0.00906491 0.00905061 0.008991 0.00937319 0.0090394 0.00941396 0.00884175 0.00902915] mean value: 0.010359692573547363 key: score_time value: [0.00948429 0.00991464 0.00870991 0.00855541 0.0086112 0.00870061 0.00849533 0.00886989 0.00867701 0.00849438] mean value: 0.008851265907287598 key: test_mcc value: [0.38932432 0.23262105 0.3030303 0.12878788 0.39393939 0.21374669 0.5164589 0.21969697 0.09245003 0.37796447] mean value: 0.2868020012621178 key: train_mcc value: [0.35608875 0.3660859 0.37560698 0.42436935 0.3755949 0.35621133 0.3658258 0.41462022 0.43763636 0.40824829] mean value: 0.3880287875437212 key: test_accuracy value: [0.69565217 0.60869565 0.65217391 0.56521739 0.69565217 0.60869565 0.73913043 0.60869565 0.54545455 0.68181818] mean value: 0.6401185770750988 key: train_accuracy value: [0.67804878 0.68292683 0.68780488 0.71219512 0.68780488 0.67804878 0.68292683 0.70731707 0.7184466 0.7038835 ] mean value: 0.6939403267819086 key: test_fscore value: [0.66666667 0.64 0.63636364 0.54545455 0.69565217 0.64 0.7 0.60869565 0.58333333 0.63157895] mean value: 0.634774495527356 key: train_fscore value: [0.68269231 0.67980296 0.69230769 0.71497585 0.68627451 0.67961165 0.67980296 0.70588235 0.72641509 0.71090047] mean value: 0.6958665838244484 key: test_precision value: [0.7 0.57142857 0.63636364 0.54545455 0.72727273 0.61538462 0.875 0.63636364 0.53846154 0.75 ] mean value: 0.659572927072927 key: train_precision value: [0.67619048 0.69 0.68571429 0.71153846 0.68627451 0.67307692 0.68316832 0.70588235 0.70642202 0.69444444] mean value: 0.6912711788889996 key: test_recall value: [0.63636364 0.72727273 0.63636364 0.54545455 0.66666667 0.66666667 0.58333333 0.58333333 0.63636364 0.54545455] mean value: 0.6227272727272727 key: train_recall value: [0.68932039 0.66990291 0.69902913 0.7184466 0.68627451 0.68627451 0.67647059 0.70588235 0.74757282 0.72815534] mean value: 0.7007329145250334 key: test_roc_auc value: [0.69318182 0.61363636 0.65151515 0.56439394 0.6969697 0.60606061 0.74621212 0.60984848 0.54545455 0.68181818] mean value: 0.6409090909090909 key: train_roc_auc value: [0.67799353 0.68299067 0.68774986 0.71216448 0.68779745 0.67808871 0.68289549 0.70731011 0.7184466 0.7038835 ] mean value: 0.6939320388349515 key: test_jcc value: [0.5 0.47058824 0.46666667 0.375 0.53333333 0.47058824 0.53846154 0.4375 0.41176471 0.46153846] mean value: 0.46654411764705883 key: train_jcc value: [0.51824818 0.51492537 0.52941176 0.55639098 0.52238806 0.51470588 0.51492537 0.54545455 0.57037037 0.55147059] mean value: 0.5338291109715274 MCC on Blind test: 0.36 Accuracy on Blind test: 0.68 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01177096 0.01734757 0.01680732 0.01921105 0.01541638 0.01629424 0.01723695 0.01650763 0.01626301 0.01551557] mean value: 0.01623706817626953 key: score_time value: [0.00875068 0.01169848 0.01196933 0.01190495 0.01174927 0.01198626 0.01195526 0.01173544 0.01152921 0.01165438] mean value: 0.011493325233459473 key: test_mcc value: [0.91666667 0.5164589 0.41096386 0.74242424 0.56818182 0.6992059 0.58930667 0.74242424 0.54232614 0.36514837] mean value: 0.6093106813380207 key: train_mcc value: [0.80613459 0.94164684 0.91429989 0.94163576 0.88310329 0.82593778 0.67701604 0.91224062 0.79681907 0.89527379] mean value: 0.8594107664325119 key: test_accuracy value: [0.95652174 0.73913043 0.69565217 0.86956522 0.7826087 0.82608696 0.7826087 0.86956522 0.72727273 0.68181818] mean value: 0.7930830039525691 key: train_accuracy value: [0.89756098 0.97073171 0.95609756 0.97073171 0.94146341 0.90731707 0.81463415 0.95609756 0.88834951 0.94660194] mean value: 0.9249585602652143 key: test_fscore value: [0.95652174 0.76923077 0.72 0.86956522 0.7826087 0.8 0.76190476 0.86956522 0.78571429 0.66666667] mean value: 0.79817773530817 key: train_fscore value: [0.9058296 0.97058824 0.95774648 0.97115385 0.94174757 0.89839572 0.77108434 0.95609756 0.89956332 0.94835681] mean value: 0.9220563476088464 key: test_precision value: [0.91666667 0.66666667 0.64285714 0.83333333 0.81818182 1. 0.88888889 0.90909091 0.64705882 0.7 ] mean value: 0.8022744249214837 key: train_precision value: [0.84166667 0.98019802 0.92727273 0.96190476 0.93269231 0.98823529 1. 0.95145631 0.81746032 0.91818182] mean value: 0.9319068223777838 key: test_recall value: [1. 0.90909091 0.81818182 0.90909091 0.75 0.66666667 0.66666667 0.83333333 1. 0.63636364] mean value: 0.818939393939394 key: train_recall value: [0.98058252 0.96116505 0.99029126 0.98058252 0.95098039 0.82352941 0.62745098 0.96078431 1. 0.98058252] mean value: 0.9255948981534361 key: test_roc_auc value: [0.95833333 0.74621212 0.70075758 0.87121212 0.78409091 0.83333333 0.78787879 0.87121212 0.72727273 0.68181818] mean value: 0.7962121212121211 key: train_roc_auc value: [0.89715401 0.9707786 0.95592994 0.97068342 0.94150961 0.90691034 0.81372549 0.95612031 0.88834951 0.94660194] mean value: 0.9247763182943081 key: test_jcc value: [0.91666667 0.625 0.5625 0.76923077 0.64285714 0.66666667 0.61538462 0.76923077 0.64705882 0.5 ] mean value: 0.6714595453566042 key: train_jcc value: [0.82786885 0.94285714 0.91891892 0.94392523 0.88990826 0.81553398 0.62745098 0.91588785 0.81746032 0.90178571] mean value: 0.8601597247948675 MCC on Blind test: 0.31 Accuracy on Blind test: 0.65 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01530957 0.01498342 0.01542163 0.01644206 0.01473165 0.01477385 0.01383638 0.0155046 0.01537108 0.01408386] mean value: 0.015045809745788574 key: score_time value: [0.01238942 0.01142144 0.01168561 0.01138973 0.01165462 0.01171398 0.01134014 0.01167512 0.01169944 0.01163602] mean value: 0.011660552024841309 key: test_mcc value: [0.32232919 0.41096386 0.3030303 0.56490196 0.76277007 0.82575758 0.40451992 0.66414149 0.40824829 0.40824829] mean value: 0.5074910937104659 key: train_mcc value: [0.47469541 0.90259929 0.9707786 0.92479811 0.86761151 0.86052253 0.37926401 0.803912 0.61850654 0.82977382] mean value: 0.7632461822687774 key: test_accuracy value: [0.60869565 0.69565217 0.65217391 0.7826087 0.86956522 0.91304348 0.65217391 0.82608696 0.68181818 0.68181818] mean value: 0.7363636363636363 key: train_accuracy value: [0.68292683 0.95121951 0.98536585 0.96097561 0.93170732 0.92682927 0.62439024 0.89268293 0.77669903 0.90776699] mean value: 0.8640563580393086 key: test_fscore value: [0.30769231 0.72 0.63636364 0.76190476 0.88888889 0.91666667 0.75 0.81818182 0.74074074 0.58823529] mean value: 0.7128674114556468 key: train_fscore value: [0.53900709 0.95192308 0.98536585 0.95959596 0.93457944 0.92146597 0.72597865 0.87912088 0.81746032 0.89839572] mean value: 0.8612892956408041 key: test_precision value: [1. 0.64285714 0.63636364 0.8 0.8 0.91666667 0.6 0.9 0.625 0.83333333] mean value: 0.775422077922078 key: train_precision value: [1. 0.94285714 0.99019608 1. 0.89285714 0.98876404 0.5698324 1. 0.69127517 1. ] mean value: 0.907578197910935 key: test_recall value: [0.18181818 0.81818182 0.63636364 0.72727273 1. 0.91666667 1. 0.75 0.90909091 0.45454545] mean value: 0.7393939393939394 key: train_recall value: [0.36893204 0.96116505 0.98058252 0.9223301 0.98039216 0.8627451 1. 0.78431373 1. 0.81553398] mean value: 0.8675994669712546 key: test_roc_auc value: [0.59090909 0.70075758 0.65151515 0.78030303 0.86363636 0.91287879 0.63636364 0.82954545 0.68181818 0.68181818] mean value: 0.7329545454545454 key: train_roc_auc value: [0.68446602 0.95117076 0.9853893 0.96116505 0.93194365 0.92651818 0.62621359 0.89215686 0.77669903 0.90776699] mean value: 0.8643489434608795 key: test_jcc value: [0.18181818 0.5625 0.46666667 0.61538462 0.8 0.84615385 0.6 0.69230769 0.58823529 0.41666667] mean value: 0.5769732963115316 key: train_jcc value: [0.36893204 0.90825688 0.97115385 0.9223301 0.87719298 0.85436893 0.5698324 0.78431373 0.69127517 0.81553398] mean value: 0.7763190053397688 MCC on Blind test: 0.21 Accuracy on Blind test: 0.6 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.14774132 0.12597418 0.12698054 0.12672639 0.12538242 0.1247561 0.12422967 0.12333274 0.12338328 0.1225481 ] mean value: 0.1271054744720459 key: score_time value: [0.01492047 0.01496315 0.0151732 0.01502323 0.01556945 0.01488686 0.01481652 0.01488638 0.0149827 0.01499629] mean value: 0.015021824836730957 key: test_mcc value: [0.91605722 0.58930667 0.66414149 1. 0.66414149 0.91666667 0.76277007 0.82575758 1. 1. ] mean value: 0.8338841181236702 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95652174 0.7826087 0.82608696 1. 0.82608696 0.95652174 0.86956522 0.91304348 1. 1. ] mean value: 0.9130434782608696 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95238095 0.8 0.83333333 1. 0.81818182 0.95652174 0.88888889 0.91666667 1. 1. ] mean value: 0.9165973398582095 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.71428571 0.76923077 1. 0.9 1. 0.8 0.91666667 1. 1. ] mean value: 0.910018315018315 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.90909091 0.90909091 1. 0.75 0.91666667 1. 0.91666667 1. 1. ] mean value: 0.931060606060606 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95454545 0.78787879 0.82954545 1. 0.82954545 0.95833333 0.86363636 0.91287879 1. 1. ] mean value: 0.9136363636363636 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.90909091 0.66666667 0.71428571 1. 0.69230769 0.91666667 0.8 0.84615385 1. 1. ] mean value: 0.8545171495171495 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.05 Accuracy on Blind test: 0.52 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.04642153 0.04462147 0.04027104 0.04553676 0.04686022 0.05081582 0.03903246 0.04629207 0.04119587 0.04431295] mean value: 0.04453601837158203 key: score_time value: [0.01880813 0.02288532 0.02414727 0.0243566 0.02825975 0.02347636 0.02017665 0.02507305 0.02807355 0.02321911] mean value: 0.023847579956054688 key: test_mcc value: [0.50168817 0.58930667 0.56818182 1. 0.76764947 0.74242424 0.83743579 0.91666667 0.91287093 0.81818182] mean value: 0.7654405571559706 key: train_mcc value: [1. 0.98048734 0.98067587 1. 0.99029034 0.98067223 0.99029034 0.97114302 0.97128586 0.99033794] mean value: 0.9855182943104235 key: test_accuracy value: [0.73913043 0.7826087 0.7826087 1. 0.86956522 0.86956522 0.91304348 0.95652174 0.95454545 0.90909091] mean value: 0.8776679841897234 key: train_accuracy value: [1. 0.9902439 0.9902439 1. 0.99512195 0.9902439 0.99512195 0.98536585 0.98543689 0.99514563] mean value: 0.9926923987686479 key: test_fscore value: [0.66666667 0.8 0.7826087 1. 0.85714286 0.86956522 0.92307692 0.95652174 0.95238095 0.90909091] mean value: 0.8717053960532221 key: train_fscore value: [1. 0.99029126 0.99019608 1. 0.99507389 0.99009901 0.99507389 0.98507463 0.98522167 0.99512195] mean value: 0.9926152386681547 key: test_precision value: [0.85714286 0.71428571 0.75 1. 1. 0.90909091 0.85714286 1. 1. 0.90909091] mean value: 0.8996753246753246 key: train_precision value: [1. 0.99029126 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9990291262135922 key: test_recall value: [0.54545455 0.90909091 0.81818182 1. 0.75 0.83333333 1. 0.91666667 0.90909091 0.90909091] mean value: 0.8590909090909091 key: train_recall value: [1. 0.99029126 0.98058252 1. 0.99019608 0.98039216 0.99019608 0.97058824 0.97087379 0.99029126] mean value: 0.9863411383971065 key: test_roc_auc value: [0.73106061 0.78787879 0.78409091 1. 0.875 0.87121212 0.90909091 0.95833333 0.95454545 0.90909091] mean value: 0.878030303030303 key: train_roc_auc value: [1. 0.99024367 0.99029126 1. 0.99509804 0.99019608 0.99509804 0.98529412 0.98543689 0.99514563] mean value: 0.9926803731201218 key: test_jcc value: [0.5 0.66666667 0.64285714 1. 0.75 0.76923077 0.85714286 0.91666667 0.90909091 0.83333333] mean value: 0.7844988344988345 key: train_jcc value: [1. 0.98076923 0.98058252 1. 0.99019608 0.98039216 0.99019608 0.97058824 0.97087379 0.99029126] mean value: 0.9853889352604372 MCC on Blind test: 0.03 Accuracy on Blind test: 0.51 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.045012 0.04318142 0.0282228 0.02828217 0.0310216 0.08236885 0.0660882 0.06425071 0.07074213 0.07848597] mean value: 0.05376558303833008 key: score_time value: [0.02342558 0.01257706 0.01256967 0.01259303 0.02418852 0.02631974 0.02399254 0.02026963 0.02379179 0.02296281] mean value: 0.020269036293029785 key: test_mcc value: [0.47727273 0.48856385 0.31252706 0.03816905 0.5164589 0.66414149 0.5164589 0.38932432 0.29277002 0.18898224] mean value: 0.3884668558613338 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.99033794] mean value: 0.9990337937660287 key: test_accuracy value: [0.73913043 0.73913043 0.65217391 0.52173913 0.73913043 0.82608696 0.73913043 0.69565217 0.63636364 0.59090909] mean value: 0.6879446640316206 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.99514563] mean value: 0.9995145631067961 key: test_fscore value: [0.72727273 0.75 0.55555556 0.47619048 0.7 0.81818182 0.7 0.72 0.55555556 0.52631579] mean value: 0.6529071922229817 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.99512195] mean value: 0.9995121951219512 key: test_precision value: [0.72727273 0.69230769 0.71428571 0.5 0.875 0.9 0.875 0.69230769 0.71428571 0.625 ] mean value: 0.731545954045954 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 0.81818182 0.45454545 0.45454545 0.58333333 0.75 0.58333333 0.75 0.45454545 0.45454545] mean value: 0.603030303030303 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.99029126] mean value: 0.9990291262135922 key: test_roc_auc value: [0.73863636 0.74242424 0.64393939 0.51893939 0.74621212 0.82954545 0.74621212 0.69318182 0.63636364 0.59090909] mean value: 0.6886363636363636 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.99514563] mean value: 0.9995145631067961 key: test_jcc value: [0.57142857 0.6 0.38461538 0.3125 0.53846154 0.69230769 0.53846154 0.5625 0.38461538 0.35714286] mean value: 0.4942032967032967 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.99029126] mean value: 0.9990291262135922 MCC on Blind test: 0.17 Accuracy on Blind test: 0.59 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.43832111 0.41460967 0.41730499 0.418607 0.41266894 0.41043901 0.41033554 0.40992641 0.41230726 0.40911317] mean value: 0.4153633117675781 key: score_time value: [0.00984025 0.00928712 0.00983596 0.00985432 0.00924468 0.0091536 0.00908923 0.00908732 0.00944901 0.00996041] mean value: 0.00948019027709961 key: test_mcc value: [0.74047959 0.5164589 0.48856385 1. 0.76764947 0.91666667 0.76277007 1. 1. 1. ] mean value: 0.8192588557461732 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 0.73913043 0.73913043 1. 0.86956522 0.95652174 0.86956522 1. 1. 1. ] mean value: 0.9043478260869565 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 0.76923077 0.75 1. 0.85714286 0.95652174 0.88888889 1. 1. 1. ] mean value: 0.9078927111535807 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.9 0.66666667 0.69230769 1. 1. 1. 0.8 1. 1. 1. ] mean value: 0.9058974358974359 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.90909091 0.81818182 1. 0.75 0.91666667 1. 1. 1. 1. ] mean value: 0.9212121212121213 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86742424 0.74621212 0.74242424 1. 0.875 0.95833333 0.86363636 1. 1. 1. ] mean value: 0.9053030303030303 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 0.625 0.6 1. 0.75 0.91666667 0.8 1. 1. 1. ] mean value: 0.8441666666666666 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.53 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.02009869 0.02192163 0.02121377 0.02089286 0.03631091 0.02019191 0.0366838 0.02073383 0.03662992 0.03478622] mean value: 0.026946353912353515 key: score_time value: [0.01233768 0.01223588 0.01664376 0.01710796 0.01231337 0.01765871 0.01238084 0.017627 0.01230049 0.01220369] mean value: 0.014280939102172851 key: test_mcc value: [0.47727273 0.83971912 0.23262105 0.56818182 0.91605722 0.62050523 0.91605722 0.65909298 0.63636364 0.64715023] mean value: 0.6513021246123849 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73913043 0.91304348 0.60869565 0.7826087 0.95652174 0.7826087 0.95652174 0.82608696 0.81818182 0.81818182] mean value: 0.8201581027667985 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.72727273 0.91666667 0.64 0.7826087 0.96 0.82758621 0.96 0.84615385 0.81818182 0.83333333] mean value: 0.8311803294157117 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.72727273 0.84615385 0.57142857 0.75 0.92307692 0.70588235 0.92307692 0.78571429 0.81818182 0.76923077] mean value: 0.782001821707704 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 1. 0.72727273 0.81818182 1. 1. 1. 0.91666667 0.81818182 0.90909091] mean value: 0.8916666666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.73863636 0.91666667 0.61363636 0.78409091 0.95454545 0.77272727 0.95454545 0.8219697 0.81818182 0.81818182] mean value: 0.8193181818181818 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.57142857 0.84615385 0.47058824 0.64285714 0.92307692 0.70588235 0.92307692 0.73333333 0.69230769 0.71428571] mean value: 0.7222990734755441 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.21 Accuracy on Blind test: 0.58 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02222586 0.02766562 0.0338943 0.02159619 0.03423214 0.03410792 0.03486061 0.03411412 0.03433919 0.03421783] mean value: 0.031125378608703614 key: score_time value: [0.02374887 0.02142906 0.02060509 0.02018595 0.02212143 0.0235455 0.02057886 0.02279162 0.0233283 0.02117443] mean value: 0.021950912475585938 key: test_mcc value: [0.82575758 0.74242424 0.56818182 0.82575758 0.66414149 0.82575758 0.74047959 0.82575758 0.63636364 0.46225016] mean value: 0.7116871241591681 key: train_mcc value: [0.9024367 0.93175328 0.92194936 0.91224062 0.93175328 0.93174679 0.94163576 0.90259929 0.89324598 0.93243443] mean value: 0.9201795515216483 key: test_accuracy value: [0.91304348 0.86956522 0.7826087 0.91304348 0.82608696 0.91304348 0.86956522 0.91304348 0.81818182 0.72727273] mean value: 0.8545454545454545 key: train_accuracy value: [0.95121951 0.96585366 0.96097561 0.95609756 0.96585366 0.96585366 0.97073171 0.95121951 0.94660194 0.96601942] mean value: 0.9600426237272082 key: test_fscore value: [0.90909091 0.86956522 0.7826087 0.90909091 0.81818182 0.91666667 0.88 0.91666667 0.81818182 0.7 ] mean value: 0.8520052700922266 key: train_fscore value: [0.95145631 0.96585366 0.96116505 0.95609756 0.96585366 0.96551724 0.97029703 0.95049505 0.9468599 0.96650718] mean value: 0.9600102638274448 key: test_precision value: [0.90909091 0.83333333 0.75 0.90909091 0.9 0.91666667 0.84615385 0.91666667 0.81818182 0.77777778] mean value: 0.8576961926961927 key: train_precision value: [0.95145631 0.97058824 0.96116505 0.96078431 0.96116505 0.97029703 0.98 0.96 0.94230769 0.95283019] mean value: 0.9610593867476506 key: test_recall value: [0.90909091 0.90909091 0.81818182 0.90909091 0.75 0.91666667 0.91666667 0.91666667 0.81818182 0.63636364] mean value: 0.85 key: train_recall value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:135: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:138: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.95145631 0.96116505 0.96116505 0.95145631 0.97058824 0.96078431 0.96078431 0.94117647 0.95145631 0.98058252] mean value: 0.9590614886731392 key: test_roc_auc value: [0.91287879 0.87121212 0.78409091 0.91287879 0.82954545 0.91287879 0.86742424 0.91287879 0.81818182 0.72727273] mean value: 0.8549242424242424 key: train_roc_auc value: [0.95121835 0.96587664 0.96097468 0.95612031 0.96587664 0.96582905 0.97068342 0.95117076 0.94660194 0.96601942] mean value: 0.9600371216447744 key: test_jcc value: [0.83333333 0.76923077 0.64285714 0.83333333 0.69230769 0.84615385 0.78571429 0.84615385 0.69230769 0.53846154] mean value: 0.747985347985348 key: train_jcc value: [0.90740741 0.93396226 0.92523364 0.91588785 0.93396226 0.93333333 0.94230769 0.90566038 0.89908257 0.93518519] mean value: 0.9232022588028438 MCC on Blind test: 0.19 Accuracy on Blind test: 0.59 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.34404731 0.23672104 0.2441833 0.23749495 0.25613022 0.24685955 0.24196959 0.28543353 0.30825043 0.23037481] mean value: 0.2631464719772339 key: score_time value: [0.02091432 0.02341533 0.02169251 0.01936412 0.01624894 0.0224669 0.01370192 0.01648545 0.02128911 0.02203679] mean value: 0.019761538505554198 key: test_mcc value: [0.82575758 0.65909298 0.56818182 0.82575758 0.66414149 0.82575758 0.74047959 0.82575758 0.73029674 0.46225016] mean value: 0.7127473088509607 key: train_mcc value: [0.9024367 0.93175328 0.92194936 0.91224062 0.95163291 0.93174679 0.94163576 0.90259929 0.93208276 0.93243443] mean value: 0.9260511918867643 key: test_accuracy value: [0.91304348 0.82608696 0.7826087 0.91304348 0.82608696 0.91304348 0.86956522 0.91304348 0.86363636 0.72727273] mean value: 0.8547430830039525 key: train_accuracy value: [0.95121951 0.96585366 0.96097561 0.95609756 0.97560976 0.96585366 0.97073171 0.95121951 0.96601942 0.96601942] mean value: 0.9629599810561212 key: test_fscore value: [0.90909091 0.8 0.7826087 0.90909091 0.81818182 0.91666667 0.88 0.91666667 0.85714286 0.7 ] mean value: 0.8489448522492 key: train_fscore value: [0.95145631 0.96585366 0.96116505 0.95609756 0.97584541 0.96551724 0.97029703 0.95049505 0.96618357 0.96650718] mean value: 0.9629418061863466 key: test_precision value: [0.90909091 0.88888889 0.75 0.90909091 0.9 0.91666667 0.84615385 0.91666667 0.9 0.77777778] mean value: 0.8714335664335664 key: train_precision value: [0.95145631 0.97058824 0.96116505 0.96078431 0.96190476 0.97029703 0.98 0.96 0.96153846 0.95283019] mean value: 0.9630564350068348 key: test_recall value: [0.90909091 0.72727273 0.81818182 0.90909091 0.75 0.91666667 0.91666667 0.91666667 0.81818182 0.63636364] mean value: 0.8318181818181818 key: train_recall value: [0.95145631 0.96116505 0.96116505 0.95145631 0.99019608 0.96078431 0.96078431 0.94117647 0.97087379 0.98058252] mean value: 0.9629640205596802 key: test_roc_auc value: [0.91287879 0.8219697 0.78409091 0.91287879 0.82954545 0.91287879 0.86742424 0.91287879 0.86363636 0.72727273] mean value: 0.8545454545454545 key: train_roc_auc value: [0.95121835 0.96587664 0.96097468 0.95612031 0.97568056 0.96582905 0.97068342 0.95117076 0.96601942 0.96601942] mean value: 0.9629592613744528 key: test_jcc value: [0.83333333 0.66666667 0.64285714 0.83333333 0.69230769 0.84615385 0.78571429 0.84615385 0.75 0.53846154] mean value: 0.7434981684981685 key: train_jcc value: [0.90740741 0.93396226 0.92523364 0.91588785 0.95283019 0.93333333 0.94230769 0.90566038 0.93457944 0.93518519] mean value: 0.9286387383001737 MCC on Blind test: 0.12 Accuracy on Blind test: 0.56 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.0311234 0.03104281 0.03307748 0.03271508 0.03284502 0.03534842 0.0292809 0.03247356 0.0327239 0.0364511 ] mean value: 0.032708168029785156 key: score_time value: [0.01306868 0.01228809 0.01467776 0.01206303 0.01464558 0.01219606 0.01215053 0.01212502 0.0147841 0.01199746] mean value: 0.012999629974365235 key: test_mcc value: [0.74047959 0.5164589 0.48856385 0.56490196 0.74047959 0.83971912 0.74047959 0.91666667 0.63636364 0.18257419] mean value: 0.6366687092895582 key: train_mcc value: [0.86356283 0.84451258 0.87352395 0.8350976 0.82455974 0.84407425 0.88361919 0.81564443 0.81742389 0.91266437] mean value: 0.8514682834181677 key: test_accuracy value: [0.86956522 0.73913043 0.73913043 0.7826087 0.86956522 0.91304348 0.86956522 0.95652174 0.81818182 0.59090909] mean value: 0.8148221343873517 key: train_accuracy value: [0.93170732 0.92195122 0.93658537 0.91707317 0.91219512 0.92195122 0.94146341 0.90731707 0.90776699 0.95631068] mean value: 0.9254321572341937 key: test_fscore value: [0.85714286 0.76923077 0.75 0.76190476 0.88 0.90909091 0.88 0.95652174 0.81818182 0.57142857] mean value: 0.8153501426110121 key: train_fscore value: [0.93269231 0.92380952 0.93779904 0.91943128 0.91262136 0.9223301 0.94230769 0.90909091 0.91079812 0.95652174] mean value: 0.926740207309033 key: test_precision value: [0.9 0.66666667 0.69230769 0.8 0.84615385 1. 0.84615385 1. 0.81818182 0.6 ] mean value: 0.816946386946387 key: train_precision value: [0.92380952 0.90654206 0.9245283 0.89814815 0.90384615 0.91346154 0.9245283 0.88785047 0.88181818 0.95192308] mean value: 0.9116455750144694 key: test_recall value: [0.81818182 0.90909091 0.81818182 0.72727273 0.91666667 0.83333333 0.91666667 0.91666667 0.81818182 0.54545455] mean value: 0.821969696969697 key: train_recall value: [0.94174757 0.94174757 0.95145631 0.94174757 0.92156863 0.93137255 0.96078431 0.93137255 0.94174757 0.96116505] mean value: 0.9424709689701123 key: test_roc_auc value: [0.86742424 0.74621212 0.74242424 0.78030303 0.86742424 0.91666667 0.86742424 0.95833333 0.81818182 0.59090909] mean value: 0.8155303030303029 key: train_roc_auc value: [0.9316581 0.92185418 0.93651247 0.91695222 0.91224062 0.92199695 0.94155721 0.90743385 0.90776699 0.95631068] mean value: 0.925428326670474 key: test_jcc value: [0.75 0.625 0.6 0.61538462 0.78571429 0.83333333 0.78571429 0.91666667 0.69230769 0.4 ] mean value: 0.7004120879120879 key: train_jcc value: [0.87387387 0.85840708 0.88288288 0.85087719 0.83928571 0.85585586 0.89090909 0.83333333 0.8362069 0.91666667] mean value: 0.8638298586987616 MCC on Blind test: 0.34 Accuracy on Blind test: 0.67 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.93439674 0.79571605 0.78386688 0.91385674 0.78201485 0.82960582 0.88547063 0.7484262 0.93659878 0.74970579] mean value: 0.8359658479690552 key: score_time value: [0.01902795 0.01569152 0.0154388 0.01550126 0.01569033 0.01552463 0.01555157 0.01228809 0.01748419 0.01834369] mean value: 0.016054201126098632 key: test_mcc value: [0.74047959 0.56818182 0.56818182 0.65151515 0.76764947 0.91666667 0.56490196 0.91666667 0.75592895 0.46225016] mean value: 0.691242224971122 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.99033794] mean value: 0.9990337937660287 key: test_accuracy value: [0.86956522 0.7826087 0.7826087 0.82608696 0.86956522 0.95652174 0.7826087 0.95652174 0.86363636 0.72727273] mean value: 0.841699604743083 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.99514563] mean value: 0.9995145631067961 key: test_fscore value: [0.85714286 0.7826087 0.7826087 0.81818182 0.85714286 0.95652174 0.8 0.95652174 0.84210526 0.7 ] mean value: 0.8352833665190644 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.99512195] mean value: 0.9995121951219512 key: test_precision value: [0.9 0.75 0.75 0.81818182 1. 1. 0.76923077 1. 1. 0.77777778] mean value: 0.8765190365190365 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.81818182 0.81818182 0.81818182 0.75 0.91666667 0.83333333 0.91666667 0.72727273 0.63636364] mean value: 0.8053030303030303 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.99029126] mean value: 0.9990291262135922 key: test_roc_auc value: [0.86742424 0.78409091 0.78409091 0.82575758 0.875 0.95833333 0.78030303 0.95833333 0.86363636 0.72727273] mean value: 0.8424242424242424 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.99514563] mean value: 0.9995145631067961 key: test_jcc value: [0.75 0.64285714 0.64285714 0.69230769 0.75 0.91666667 0.66666667 0.91666667 0.72727273 0.53846154] mean value: 0.7243756243756244 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.99029126] mean value: 0.9990291262135922 MCC on Blind test: 0.19 Accuracy on Blind test: 0.59 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.02677464 0.01045084 0.01013684 0.00999856 0.01005387 0.01009083 0.00991368 0.0105505 0.01012111 0.01038074] mean value: 0.011847162246704101 key: score_time value: [0.01160312 0.00989413 0.00953531 0.00954342 0.00953102 0.0099299 0.00961757 0.00996542 0.00982594 0.00982404] mean value: 0.009926986694335938 key: test_mcc value: [0.44411739 0.41096386 0.41096386 0.15096491 0.22407133 0.74047959 0.42228828 0.24960096 0.09759001 0.18898224] mean value: 0.3340022427057291 key: train_mcc value: [0.36627048 0.417866 0.45930893 0.40305908 0.501235 0.431714 0.49387839 0.43730041 0.46621721 0.50903935] mean value: 0.44858888387912876 key: test_accuracy value: [0.69565217 0.69565217 0.69565217 0.56521739 0.60869565 0.86956522 0.69565217 0.60869565 0.54545455 0.59090909] mean value: 0.6571146245059288 key: train_accuracy value: [0.65853659 0.69756098 0.71707317 0.67317073 0.74146341 0.70243902 0.73170732 0.70731707 0.73300971 0.74271845] mean value: 0.7104996448022732 key: test_fscore value: [0.74074074 0.72 0.72 0.61538462 0.68965517 0.88 0.75862069 0.70967742 0.61538462 0.64 ] mean value: 0.7089463252933775 key: train_fscore value: [0.72868217 0.74166667 0.75833333 0.74131274 0.77056277 0.74476987 0.76987448 0.74576271 0.72906404 0.77637131] mean value: 0.7506400093172734 key: test_precision value: [0.625 0.64285714 0.64285714 0.53333333 0.58823529 0.84615385 0.64705882 0.57894737 0.53333333 0.57142857] mean value: 0.6209204856031482 key: train_precision value: [0.60645161 0.64963504 0.66423358 0.61538462 0.68992248 0.64963504 0.67153285 0.65671642 0.74 0.68656716] mean value: 0.6630078787347913 key: test_recall value: [0.90909091 0.81818182 0.81818182 0.72727273 0.83333333 0.91666667 0.91666667 0.91666667 0.72727273 0.72727273] mean value: 0.831060606060606 key: train_recall value: [0.91262136 0.86407767 0.88349515 0.93203883 0.87254902 0.87254902 0.90196078 0.8627451 0.7184466 0.89320388] mean value: 0.8713687416714259 key: test_roc_auc value: [0.70454545 0.70075758 0.70075758 0.5719697 0.59848485 0.86742424 0.68560606 0.59469697 0.54545455 0.59090909] mean value: 0.656060606060606 key: train_roc_auc value: [0.65729107 0.69674472 0.71625738 0.67190177 0.74209975 0.7032648 0.73253379 0.70807158 0.73300971 0.74271845] mean value: 0.7103893013516086 key: test_jcc value: [0.58823529 0.5625 0.5625 0.44444444 0.52631579 0.78571429 0.61111111 0.55 0.44444444 0.47058824] mean value: 0.5545853604599734 key: train_jcc value: [0.57317073 0.58940397 0.61073826 0.58895706 0.62676056 0.59333333 0.62585034 0.59459459 0.57364341 0.63448276] mean value: 0.6010935016383199 MCC on Blind test: 0.45 Accuracy on Blind test: 0.71 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00957608 0.0098474 0.00965476 0.00997877 0.01009202 0.00920963 0.00911617 0.00889516 0.00902343 0.00907636] mean value: 0.009446978569030762 key: score_time value: [0.00914979 0.00911403 0.00903225 0.00959492 0.00927424 0.0087173 0.00860095 0.00873423 0.00871611 0.00861526] mean value: 0.008954906463623047 key: test_mcc value: [0.58002308 0.12878788 0.12336594 0.21452908 0.39393939 0.39393939 0.05427825 0.39393939 0.18257419 0.20412415] mean value: 0.26695007377892416 key: train_mcc value: [0.43994849 0.50824626 0.49637007 0.45056913 0.46832513 0.45757548 0.46948042 0.48928361 0.48018451 0.46191786] mean value: 0.47219009507999327 key: test_accuracy value: [0.7826087 0.56521739 0.56521739 0.60869565 0.69565217 0.69565217 0.52173913 0.69565217 0.59090909 0.59090909] mean value: 0.6312252964426878 key: train_accuracy value: [0.71707317 0.75121951 0.74634146 0.72195122 0.72682927 0.72682927 0.73170732 0.74146341 0.73786408 0.72815534] mean value: 0.7329434051622069 key: test_fscore value: [0.73684211 0.54545455 0.44444444 0.52631579 0.69565217 0.69565217 0.47619048 0.69565217 0.57142857 0.47058824] mean value: 0.5858220689288127 key: train_fscore value: [0.69473684 0.73298429 0.73195876 0.6984127 0.68539326 0.70526316 0.70588235 0.71657754 0.71875 0.70526316] mean value: 0.7095222063862845 key: test_precision value: [0.875 0.54545455 0.57142857 0.625 0.72727273 0.72727273 0.55555556 0.72727273 0.6 0.66666667] mean value: 0.6620923520923521 key: train_precision value: [0.75862069 0.79545455 0.78021978 0.76744186 0.80263158 0.76136364 0.77647059 0.78823529 0.7752809 0.77011494] mean value: 0.7775833814863701 key: test_recall value: [0.63636364 0.54545455 0.36363636 0.45454545 0.66666667 0.66666667 0.41666667 0.66666667 0.54545455 0.36363636] mean value: 0.5325757575757576 key: train_recall value: [0.6407767 0.67961165 0.68932039 0.6407767 0.59803922 0.65686275 0.64705882 0.65686275 0.66990291 0.65048544] mean value: 0.6529697315819532 key: test_roc_auc value: [0.77651515 0.56439394 0.55681818 0.60227273 0.6969697 0.6969697 0.52651515 0.6969697 0.59090909 0.59090909] mean value: 0.6299242424242424 key: train_roc_auc value: [0.71744717 0.75157053 0.74662098 0.72234913 0.72620407 0.72648962 0.7312964 0.74105273 0.73786408 0.72815534] mean value: 0.7329050066628594 key: test_jcc value: [0.58333333 0.375 0.28571429 0.35714286 0.53333333 0.53333333 0.3125 0.53333333 0.4 0.30769231] mean value: 0.4221382783882784 key: train_jcc value: [0.53225806 0.5785124 0.57723577 0.53658537 0.52136752 0.54471545 0.54545455 0.55833333 0.56097561 0.54471545] mean value: 0.5500153503642167 MCC on Blind test: 0.26 Accuracy on Blind test: 0.63 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00882125 0.00910711 0.01014447 0.00869751 0.00881767 0.0097332 0.00888371 0.00917006 0.00906801 0.00977707] mean value: 0.009222006797790528 key: score_time value: [0.01601529 0.01748419 0.01591444 0.00995803 0.00983834 0.01029491 0.01002693 0.01065493 0.01047945 0.01068568] mean value: 0.01213521957397461 key: test_mcc value: [ 0.12406456 -0.05427825 0.22407133 0.39727608 -0.03816905 0.15096491 0.23262105 0.21452908 -0.09245003 -0.18257419] mean value: 0.0976055502533363 key: train_mcc value: [0.54175 0.5037683 0.49637007 0.50824626 0.49294992 0.54256731 0.54702284 0.44388387 0.495239 0.55433939] mean value: 0.5126136961359685 key: test_accuracy value: [0.56521739 0.47826087 0.60869565 0.69565217 0.47826087 0.56521739 0.60869565 0.60869565 0.45454545 0.40909091] mean value: 0.5472332015810277 key: train_accuracy value: [0.77073171 0.75121951 0.74634146 0.75121951 0.74634146 0.77073171 0.77073171 0.72195122 0.74757282 0.77669903] mean value: 0.7553540137343121 key: test_fscore value: [0.5 0.4 0.47058824 0.63157895 0.45454545 0.5 0.57142857 0.66666667 0.4 0.38095238] mean value: 0.49757602562556125 key: train_fscore value: [0.76847291 0.74371859 0.73195876 0.73298429 0.74 0.76142132 0.75132275 0.71921182 0.75 0.77 ] mean value: 0.7469090449228885 key: test_precision value: [0.55555556 0.44444444 0.66666667 0.75 0.5 0.625 0.66666667 0.6 0.44444444 0.4 ] mean value: 0.5652777777777778 key: train_precision value: [0.78 0.77083333 0.78021978 0.79545455 0.75510204 0.78947368 0.81609195 0.72277228 0.74285714 0.79381443] mean value: 0.7746619191132057 key: test_recall value: [0.45454545 0.36363636 0.36363636 0.54545455 0.41666667 0.41666667 0.5 0.75 0.36363636 0.36363636] mean value: 0.4537878787878788 key: train_recall value: [0.75728155 0.7184466 0.68932039 0.67961165 0.7254902 0.73529412 0.69607843 0.71568627 0.75728155 0.74757282] mean value: 0.722206358271464 key: test_roc_auc value: [0.56060606 0.47348485 0.59848485 0.68939394 0.48106061 0.5719697 0.61363636 0.60227273 0.45454545 0.40909091] mean value: 0.5454545454545454 key: train_roc_auc value: [0.77079764 0.75138016 0.74662098 0.75157053 0.74624024 0.77055968 0.77036931 0.72192081 0.74757282 0.77669903] mean value: 0.7553731201218352 key: test_jcc value: [0.33333333 0.25 0.30769231 0.46153846 0.29411765 0.33333333 0.4 0.5 0.25 0.23529412] mean value: 0.3365309200603318 key: train_jcc value: [0.624 0.592 0.57723577 0.5785124 0.58730159 0.6147541 0.60169492 0.56153846 0.6 0.62601626] mean value: 0.5963053491669482 MCC on Blind test: 0.13 Accuracy on Blind test: 0.56 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01364207 0.01871586 0.01185751 0.01367044 0.01809502 0.01582408 0.01185703 0.01197171 0.0172205 0.0127852 ] mean value: 0.014563941955566406 key: score_time value: [0.01032472 0.01589632 0.01048732 0.01059628 0.01564956 0.00968361 0.00964212 0.01131916 0.01068449 0.00989556] mean value: 0.011417913436889648 key: test_mcc value: [0.47727273 0.56818182 0.31298622 0.38932432 0.47727273 0.58930667 0.56490196 0.65909298 0.18898224 0. ] mean value: 0.4227321652529521 key: train_mcc value: [0.81495251 0.73751939 0.82438607 0.74685628 0.71733345 0.74633543 0.76638754 0.72814868 0.71088536 0.77761579] mean value: 0.7570420501696654 key: test_accuracy value: [0.73913043 0.7826087 0.65217391 0.69565217 0.73913043 0.7826087 0.7826087 0.82608696 0.59090909 0.5 ] mean value: 0.7090909090909091 key: train_accuracy value: [0.90731707 0.86829268 0.91219512 0.87317073 0.85853659 0.87317073 0.88292683 0.86341463 0.85436893 0.88834951] mean value: 0.8781742836845844 key: test_fscore value: [0.72727273 0.7826087 0.66666667 0.66666667 0.75 0.76190476 0.8 0.84615385 0.64 0.52173913] mean value: 0.7163012494751625 key: train_fscore value: [0.90909091 0.86567164 0.91262136 0.87619048 0.85572139 0.87254902 0.88 0.86666667 0.85981308 0.89099526] mean value: 0.8789319810380724 key: test_precision value: [0.72727273 0.75 0.61538462 0.7 0.75 0.88888889 0.76923077 0.78571429 0.57142857 0.5 ] mean value: 0.7057919857919858 key: train_precision value: [0.89622642 0.8877551 0.91262136 0.85981308 0.86868687 0.87254902 0.89795918 0.84259259 0.82882883 0.87037037] mean value: 0.8737402824230579 key: test_recall value: [0.72727273 0.81818182 0.72727273 0.63636364 0.75 0.66666667 0.83333333 0.91666667 0.72727273 0.54545455] mean value: 0.7348484848484849 key: train_recall value: [0.9223301 0.84466019 0.91262136 0.89320388 0.84313725 0.87254902 0.8627451 0.89215686 0.89320388 0.91262136] mean value: 0.8849229011993147 key: test_roc_auc value: [0.73863636 0.78409091 0.65530303 0.69318182 0.73863636 0.78787879 0.78030303 0.8219697 0.59090909 0.5 ] mean value: 0.7090909090909091 key: train_roc_auc value: [0.90724348 0.86840853 0.91219303 0.87307253 0.85846183 0.87316771 0.88282886 0.86355416 0.85436893 0.88834951] mean value: 0.8781648581762802 key: test_jcc value: [0.57142857 0.64285714 0.5 0.5 0.6 0.61538462 0.66666667 0.73333333 0.47058824 0.35294118] mean value: 0.5653199741435035 key: train_jcc value: [0.83333333 0.76315789 0.83928571 0.77966102 0.74782609 0.77391304 0.78571429 0.76470588 0.75409836 0.8034188 ] mean value: 0.7845114421881593 MCC on Blind test: 0.46 Accuracy on Blind test: 0.73 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.26194549 1.00380898 0.56620073 0.398633 0.38738036 0.52770424 0.82983422 0.21668768 0.39636707 0.45531631] mean value: 0.6043878078460694 key: score_time value: [0.01219726 0.01478028 0.0121758 0.01218724 0.01211095 0.01325655 0.01218605 0.01219201 0.01216078 0.01219201] mean value: 0.012543892860412598 key: test_mcc value: [ 0.65151515 0.56490196 0.41096386 0.38932432 0.56879646 0.76277007 0.83971912 0.12844577 0.09090909 -0.18257419] mean value: 0.42247716177937644 key: train_mcc value: [0.86600321 0.95126131 0.71892689 0.58007639 0.61699176 0.64768695 0.76036002 0.52267493 0.58321184 0.80643358] mean value: 0.7053626889094041 key: test_accuracy value: [0.82608696 0.7826087 0.69565217 0.69565217 0.73913043 0.86956522 0.91304348 0.56521739 0.54545455 0.40909091] mean value: 0.7041501976284584 key: train_accuracy value: [0.93170732 0.97560976 0.84390244 0.76097561 0.7902439 0.79512195 0.87804878 0.75609756 0.79126214 0.90291262] mean value: 0.8425882074354725 key: test_fscore value: [0.81818182 0.76190476 0.72 0.66666667 0.66666667 0.88888889 0.90909091 0.66666667 0.54545455 0.43478261] mean value: 0.7078303532216575 key: train_fscore value: [0.93457944 0.97584541 0.86440678 0.80478088 0.74556213 0.82926829 0.87046632 0.77678571 0.78606965 0.9047619 ] mean value: 0.8492526520928274 key: test_precision value: [0.81818182 0.8 0.64285714 0.7 1. 0.8 1. 0.55555556 0.54545455 0.41666667] mean value: 0.7278715728715729 key: train_precision value: [0.9009009 0.97115385 0.76691729 0.68243243 0.94029851 0.70833333 0.92307692 0.71311475 0.80612245 0.88785047] mean value: 0.8300200906960877 key: test_recall value: [0.81818182 0.72727273 0.81818182 0.63636364 0.5 1. 0.83333333 0.83333333 0.54545455 0.45454545] mean value: 0.7166666666666667 key: train_recall value: [0.97087379 0.98058252 0.99029126 0.98058252 0.61764706 1. 0.82352941 0.85294118 0.76699029 0.9223301 ] mean value: 0.8905768132495717 key: test_roc_auc value: [0.82575758 0.78030303 0.70075758 0.69318182 0.75 0.86363636 0.91666667 0.5530303 0.54545455 0.40909091] mean value: 0.7037878787878787 key: train_roc_auc value: [0.93151532 0.97558538 0.84318485 0.75989911 0.78940605 0.7961165 0.87778412 0.75656768 0.79126214 0.90291262] mean value: 0.8424233771178374 key: test_jcc value: [0.69230769 0.61538462 0.5625 0.5 0.5 0.8 0.83333333 0.5 0.375 0.27777778] mean value: 0.5656303418803419 key: train_jcc value: [0.87719298 0.95283019 0.76119403 0.67333333 0.59433962 0.70833333 0.7706422 0.6350365 0.64754098 0.82608696] mean value: 0.7446530128607832 MCC on Blind test: 0.29 Accuracy on Blind test: 0.64 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01721716 0.01397586 0.01314878 0.01377201 0.01334882 0.01375103 0.01316309 0.0134654 0.01368165 0.01338196] mean value: 0.013890576362609864 key: score_time value: [0.01574564 0.00902081 0.00891137 0.00908399 0.00947738 0.0089283 0.00954652 0.00874829 0.00894642 0.00875568] mean value: 0.009716439247131347 key: test_mcc value: [0.65909298 0.82575758 0.65151515 1. 1. 0.91666667 1. 0.74242424 0.73029674 1. ] mean value: 0.8525753362069441 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.82608696 0.91304348 0.82608696 1. 1. 0.95652174 1. 0.86956522 0.86363636 1. ] mean value: 0.925494071146245 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8 0.90909091 0.81818182 1. 1. 0.95652174 1. 0.86956522 0.85714286 1. ] mean value: 0.9210502540937323 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88888889 0.90909091 0.81818182 1. 1. 1. 1. 0.90909091 0.9 1. ] mean value: 0.9425252525252525 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 0.90909091 0.81818182 1. 1. 0.91666667 1. 0.83333333 0.81818182 1. ] mean value: 0.9022727272727273 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8219697 0.91287879 0.82575758 1. 1. 0.95833333 1. 0.87121212 0.86363636 1. ] mean value: 0.9253787878787879 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.66666667 0.83333333 0.69230769 1. 1. 0.91666667 1. 0.76923077 0.75 1. ] mean value: 0.8628205128205129 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.1 Accuracy on Blind test: 0.54 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.09688735 0.09758043 0.09672451 0.0968442 0.09741735 0.09862638 0.09707808 0.09623432 0.09967422 0.10141206] mean value: 0.09784789085388183 key: score_time value: [0.01796174 0.01781249 0.01779795 0.01884341 0.0174849 0.01741695 0.01794338 0.01800895 0.0174613 0.017524 ] mean value: 0.017825508117675783 key: test_mcc value: [0.76764947 0.91666667 0.56818182 0.38932432 0.41096386 0.82575758 0.82575758 0.83743579 0.83205029 0.45454545] mean value: 0.6828332828329148 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 0.95652174 0.7826087 0.69565217 0.69565217 0.91304348 0.91304348 0.91304348 0.90909091 0.72727273] mean value: 0.8375494071146244 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.88 0.95652174 0.7826087 0.66666667 0.66666667 0.91666667 0.91666667 0.92307692 0.9 0.72727273] mean value: 0.8336146751798925 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.78571429 0.91666667 0.75 0.7 0.77777778 0.91666667 0.91666667 0.85714286 1. 0.72727273] mean value: 0.8347907647907647 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 0.81818182 0.63636364 0.58333333 0.91666667 0.91666667 1. 0.81818182 0.72727273] mean value: 0.8416666666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.875 0.95833333 0.78409091 0.69318182 0.70075758 0.91287879 0.91287879 0.90909091 0.90909091 0.72727273] mean value: 0.8382575757575758 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.78571429 0.91666667 0.64285714 0.5 0.5 0.84615385 0.84615385 0.85714286 0.81818182 0.57142857] mean value: 0.7284299034299034 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.32 Accuracy on Blind test: 0.64 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00908017 0.00915766 0.00907207 0.00912404 0.00897431 0.00912595 0.00911784 0.00910449 0.01033187 0.00916791] mean value: 0.00922563076019287 key: score_time value: [0.00860286 0.00855422 0.00866795 0.00879788 0.00872707 0.00869775 0.00873923 0.00875854 0.00952125 0.00869274] mean value: 0.008775949478149414 key: test_mcc value: [0.03816905 0.56490196 0.30240737 0.03178209 0.65151515 0.5164589 0.38932432 0.74242424 0.56694671 0.36514837] mean value: 0.416907815921681 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.52173913 0.7826087 0.65217391 0.52173913 0.82608696 0.73913043 0.69565217 0.86956522 0.77272727 0.68181818] mean value: 0.7063241106719368 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.47619048 0.76190476 0.6 0.42105263 0.83333333 0.7 0.72 0.86956522 0.73684211 0.66666667] mean value: 0.6785555192328647 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.5 0.8 0.66666667 0.5 0.83333333 0.875 0.69230769 0.90909091 0.875 0.7 ] mean value: 0.7351398601398601 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.45454545 0.72727273 0.54545455 0.36363636 0.83333333 0.58333333 0.75 0.83333333 0.63636364 0.63636364] mean value: 0.6363636363636364 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.51893939 0.78030303 0.64772727 0.51515152 0.82575758 0.74621212 0.69318182 0.87121212 0.77272727 0.68181818] mean value: 0.7053030303030303 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.3125 0.61538462 0.42857143 0.26666667 0.71428571 0.53846154 0.5625 0.76923077 0.58333333 0.5 ] mean value: 0.5290934065934066 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.13 Accuracy on Blind test: 0.56 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.33109474 1.30057764 1.29539728 1.27576232 1.28721762 1.28771162 1.30014181 1.29339123 1.31685042 1.30856156] mean value: 1.2996706247329712 key: score_time value: [0.09498 0.09537101 0.09173775 0.0889883 0.09702682 0.0884161 0.09529257 0.09618378 0.09623957 0.09434962] mean value: 0.09385855197906494 key: test_mcc value: [0.65909298 0.91666667 0.56818182 0.74047959 0.66414149 0.91666667 0.91605722 0.74242424 0.81818182 0.73029674] mean value: 0.7672189240726363 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.82608696 0.95652174 0.7826087 0.86956522 0.82608696 0.95652174 0.95652174 0.86956522 0.90909091 0.86363636] mean value: 0.8816205533596838 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8 0.95652174 0.7826087 0.85714286 0.81818182 0.95652174 0.96 0.86956522 0.90909091 0.85714286] mean value: 0.876677583286279 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88888889 0.91666667 0.75 0.9 0.9 1. 0.92307692 0.90909091 0.90909091 0.9 ] mean value: 0.8996814296814297 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 1. 0.81818182 0.81818182 0.75 0.91666667 1. 0.83333333 0.90909091 0.81818182] mean value: 0.8590909090909091 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8219697 0.95833333 0.78409091 0.86742424 0.82954545 0.95833333 0.95454545 0.87121212 0.90909091 0.86363636] mean value: 0.8818181818181818 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.66666667 0.91666667 0.64285714 0.75 0.69230769 0.91666667 0.92307692 0.76923077 0.83333333 0.75 ] mean value: 0.7860805860805861 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.3 Accuracy on Blind test: 0.63 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.93184686 0.88155961 0.97611141 0.94901252 0.89592099 0.96174765 0.93060923 0.96457934 0.91490054 0.93486118] mean value: 0.934114933013916 key: score_time value: [0.24499297 0.24134612 0.18891835 0.13461995 0.24500108 0.2090826 0.13763881 0.20907116 0.14146399 0.24355125] mean value: 0.19956862926483154 key: test_mcc value: [0.65909298 0.76764947 0.58930667 0.65909298 0.74242424 0.83971912 0.82575758 0.83971912 0.73029674 0.54772256] mean value: 0.7200781469442072 key: train_mcc value: [0.9707786 0.94163576 0.97114302 0.961154 0.96116136 0.9707786 0.95163291 0.96116136 0.95150116 0.95186015] mean value: 0.9592806896568655 key: test_accuracy value: [0.82608696 0.86956522 0.7826087 0.82608696 0.86956522 0.91304348 0.91304348 0.91304348 0.86363636 0.77272727] mean value: 0.8549407114624505 key: train_accuracy value: [0.98536585 0.97073171 0.98536585 0.9804878 0.9804878 0.98536585 0.97560976 0.9804878 0.97572816 0.97572816] mean value: 0.9795358749704002 key: test_fscore value: [0.8 0.88 0.8 0.8 0.86956522 0.90909091 0.91666667 0.90909091 0.86956522 0.7826087 ] mean value: 0.8536587615283268 key: train_fscore value: [0.98536585 0.97115385 0.98564593 0.98076923 0.98058252 0.98536585 0.97584541 0.98058252 0.97584541 0.97607656] mean value: 0.9797233142078156 key: test_precision value: [0.88888889 0.78571429 0.71428571 0.88888889 0.90909091 1. 0.91666667 1. 0.83333333 0.75 ] mean value: 0.8686868686868687 key: train_precision /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( value: [0.99019608 0.96190476 0.97169811 0.97142857 0.97115385 0.98058252 0.96190476 0.97115385 0.97115385 0.96226415] mean value: 0.9713440500553794 key: test_recall value: [0.72727273 1. 0.90909091 0.72727273 0.83333333 0.83333333 0.91666667 0.83333333 0.90909091 0.81818182] mean value: 0.8507575757575758 key: train_recall value: [0.98058252 0.98058252 1. 0.99029126 0.99019608 0.99019608 0.99019608 0.99019608 0.98058252 0.99029126] mean value: 0.9883114410812869 key: test_roc_auc value: [0.8219697 0.875 0.78787879 0.8219697 0.87121212 0.91666667 0.91287879 0.91666667 0.86363636 0.77272727] mean value: 0.8560606060606061 key: train_roc_auc value: [0.9853893 0.97068342 0.98529412 0.98043975 0.98053493 0.9853893 0.97568056 0.98053493 0.97572816 0.97572816] mean value: 0.9795402627070245 key: test_jcc value: [0.66666667 0.78571429 0.66666667 0.66666667 0.76923077 0.83333333 0.84615385 0.83333333 0.76923077 0.64285714] mean value: 0.747985347985348 key: train_jcc value: [0.97115385 0.94392523 0.97169811 0.96226415 0.96190476 0.97115385 0.95283019 0.96190476 0.95283019 0.95327103] mean value: 0.9602936119308894 MCC on Blind test: 0.38 Accuracy on Blind test: 0.67 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02315831 0.00965405 0.00990772 0.00992966 0.00958633 0.00925899 0.00968218 0.00935221 0.00954723 0.00899029] mean value: 0.010906696319580078 key: score_time value: [0.010185 0.0090487 0.00989842 0.00874281 0.00877428 0.00885487 0.00888062 0.00934005 0.00959086 0.009305 ] mean value: 0.00926206111907959 key: test_mcc value: [0.58002308 0.12878788 0.12336594 0.21452908 0.39393939 0.39393939 0.05427825 0.39393939 0.18257419 0.20412415] mean value: 0.26695007377892416 key: train_mcc value: [0.43994849 0.50824626 0.49637007 0.45056913 0.46832513 0.45757548 0.46948042 0.48928361 0.48018451 0.46191786] mean value: 0.47219009507999327 key: test_accuracy value: [0.7826087 0.56521739 0.56521739 0.60869565 0.69565217 0.69565217 0.52173913 0.69565217 0.59090909 0.59090909] mean value: 0.6312252964426878 key: train_accuracy value: [0.71707317 0.75121951 0.74634146 0.72195122 0.72682927 0.72682927 0.73170732 0.74146341 0.73786408 0.72815534] mean value: 0.7329434051622069 key: test_fscore value: [0.73684211 0.54545455 0.44444444 0.52631579 0.69565217 0.69565217 0.47619048 0.69565217 0.57142857 0.47058824] mean value: 0.5858220689288127 key: train_fscore value: [0.69473684 0.73298429 0.73195876 0.6984127 0.68539326 0.70526316 0.70588235 0.71657754 0.71875 0.70526316] mean value: 0.7095222063862845 key: test_precision value: [0.875 0.54545455 0.57142857 0.625 0.72727273 0.72727273 0.55555556 0.72727273 0.6 0.66666667] mean value: 0.6620923520923521 key: train_precision value: [0.75862069 0.79545455 0.78021978 0.76744186 0.80263158 0.76136364 0.77647059 0.78823529 0.7752809 0.77011494] mean value: 0.7775833814863701 key: test_recall value: [0.63636364 0.54545455 0.36363636 0.45454545 0.66666667 0.66666667 0.41666667 0.66666667 0.54545455 0.36363636] mean value: 0.5325757575757576 key: train_recall value: [0.6407767 0.67961165 0.68932039 0.6407767 0.59803922 0.65686275 0.64705882 0.65686275 0.66990291 0.65048544] mean value: 0.6529697315819532 key: test_roc_auc value: [0.77651515 0.56439394 0.55681818 0.60227273 0.6969697 0.6969697 0.52651515 0.6969697 0.59090909 0.59090909] mean value: 0.6299242424242424 key: train_roc_auc value: [0.71744717 0.75157053 0.74662098 0.72234913 0.72620407 0.72648962 0.7312964 0.74105273 0.73786408 0.72815534] mean value: 0.7329050066628594 key: test_jcc value: [0.58333333 0.375 0.28571429 0.35714286 0.53333333 0.53333333 0.3125 0.53333333 0.4 0.30769231] mean value: 0.4221382783882784 key: train_jcc value: [0.53225806 0.5785124 0.57723577 0.53658537 0.52136752 0.54471545 0.54545455 0.55833333 0.56097561 0.54471545] mean value: 0.5500153503642167 MCC on Blind test: 0.26 Accuracy on Blind test: 0.63 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.07360411 0.05273223 0.05829406 0.06245947 0.0599854 0.05718923 0.06659317 0.06024432 0.06122375 0.06588459] mean value: 0.06182103157043457 key: score_time value: [0.01040006 0.01061964 0.01049256 0.01047063 0.01048827 0.01023436 0.01153612 0.01126742 0.0113802 0.01140666] mean value: 0.010829591751098632 key: test_mcc value: [0.91666667 0.91666667 0.74242424 1. 0.83971912 0.83971912 1. 1. 0.91287093 1. ] mean value: 0.9168066750452115 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95652174 0.95652174 0.86956522 1. 0.91304348 0.91304348 1. 1. 0.95454545 1. ] mean value: 0.9563241106719368 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95652174 0.95652174 0.86956522 1. 0.90909091 0.90909091 1. 1. 0.95652174 1. ] mean value: 0.9557312252964427 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.91666667 0.91666667 0.83333333 1. 1. 1. 1. 1. 0.91666667 1. ] mean value: 0.9583333333333334 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 0.90909091 1. 0.83333333 0.83333333 1. 1. 1. 1. ] mean value: 0.9575757575757575 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95833333 0.95833333 0.87121212 1. 0.91666667 0.91666667 1. 1. 0.95454545 1. ] mean value: 0.9575757575757575 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.91666667 0.91666667 0.76923077 1. 0.83333333 0.83333333 1. 1. 0.91666667 1. ] mean value: 0.9185897435897435 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.53 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.03170872 0.05358624 0.06331277 0.06383777 0.07470703 0.05675173 0.0568521 0.05640078 0.05648899 0.05701303] mean value: 0.057065916061401364 key: score_time value: [0.02100611 0.0222826 0.02220082 0.02002335 0.02134395 0.02366328 0.02332687 0.01963353 0.02289724 0.02070689] mean value: 0.021708464622497557 key: test_mcc value: [0.76277007 0.56490196 0.58930667 0.48075018 0.6992059 0.82575758 0.56490196 0.58930667 0.91287093 0.56694671] mean value: 0.6556718603789258 key: train_mcc value: [0.92211753 0.9707786 0.94146202 0.92211753 0.94164684 0.93175328 0.95163291 0.95126594 0.91266437 0.94192516] mean value: 0.938736418639331 key: test_accuracy value: [0.86956522 0.7826087 0.7826087 0.73913043 0.82608696 0.91304348 0.7826087 0.7826087 0.95454545 0.77272727] mean value: 0.8205533596837945 key: train_accuracy value: [0.96097561 0.98536585 0.97073171 0.96097561 0.97073171 0.96585366 0.97560976 0.97560976 0.95631068 0.97087379] mean value: 0.9693038124556003 key: test_fscore value: [0.84210526 0.76190476 0.8 0.7 0.8 0.91666667 0.8 0.76190476 0.95238095 0.73684211] mean value: 0.8071804511278196 key: train_fscore value: [0.96153846 0.98536585 0.97087379 0.96153846 0.97087379 0.96585366 0.97584541 0.97560976 0.95652174 0.97115385] mean value: 0.969517476009744 key: test_precision value: [1. 0.8 0.71428571 0.77777778 1. 0.91666667 0.76923077 0.88888889 1. 0.875 ] mean value: 0.8741849816849817 key: train_precision value: [0.95238095 0.99019608 0.97087379 0.95238095 0.96153846 0.96116505 0.96190476 0.97087379 0.95192308 0.96190476] mean value: 0.9635141666823563 key: test_recall value: [0.72727273 0.72727273 0.90909091 0.63636364 0.66666667 0.91666667 0.83333333 0.66666667 0.90909091 0.63636364] mean value: 0.7628787878787878 key: train_recall value: [0.97087379 0.98058252 0.97087379 0.97087379 0.98039216 0.97058824 0.99019608 0.98039216 0.96116505 0.98058252] mean value: 0.975652008376166 key: test_roc_auc value: [0.86363636 0.78030303 0.78787879 0.73484848 0.83333333 0.91287879 0.78030303 0.78787879 0.95454545 0.77272727] mean value: 0.8208333333333333 key: train_roc_auc value: [0.96092709 0.9853893 0.97073101 0.96092709 0.9707786 0.96587664 0.97568056 0.97563297 0.95631068 0.97087379] mean value: 0.9693127736531506 key: test_jcc value: [0.72727273 0.61538462 0.66666667 0.53846154 0.66666667 0.84615385 0.66666667 0.61538462 0.90909091 0.58333333] mean value: 0.6835081585081585 key: train_jcc value: [0.92592593 0.97115385 0.94339623 0.92592593 0.94339623 0.93396226 0.95283019 0.95238095 0.91666667 0.94392523] mean value: 0.9409563456358554 MCC on Blind test: 0.13 Accuracy on Blind test: 0.56 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.0219748 0.01021671 0.01002097 0.009799 0.00974965 0.00970197 0.00891066 0.00987864 0.00947976 0.00913072] mean value: 0.010886287689208985 key: score_time value: [0.01003003 0.00961876 0.00944829 0.00925136 0.00877619 0.00935459 0.00935292 0.00873876 0.00937414 0.00935054] mean value: 0.009329557418823242 key: test_mcc value: [0.47727273 0.39393939 0.39393939 0.38932432 0.30240737 0.56818182 0.56490196 0.02585438 0.09759001 0.09245003] mean value: 0.3305861402074863 key: train_mcc value: [0.41481375 0.48545031 0.46581391 0.4461775 0.40046964 0.42940367 0.47412116 0.45056913 0.44098577 0.50679276] mean value: 0.4514597602525188 key: test_accuracy value: [0.73913043 0.69565217 0.69565217 0.69565217 0.65217391 0.7826087 0.7826087 0.52173913 0.54545455 0.54545455] mean value: 0.6656126482213438 key: train_accuracy value: [0.70243902 0.74146341 0.73170732 0.72195122 0.69756098 0.71219512 0.73658537 0.72195122 0.7184466 0.75242718] mean value: 0.7236727444944352 key: test_fscore value: [0.72727273 0.69565217 0.69565217 0.66666667 0.69230769 0.7826087 0.8 0.62068966 0.61538462 0.58333333] mean value: 0.687956773361571 key: train_fscore value: [0.73362445 0.75576037 0.74654378 0.73732719 0.71818182 0.73059361 0.74285714 0.74208145 0.73636364 0.7627907 ] mean value: 0.7406124140900755 key: test_precision value: [0.72727273 0.66666667 0.66666667 0.7 0.64285714 0.81818182 0.76923077 0.52941176 0.53333333 0.53846154] mean value: 0.6592082427376545 key: train_precision value: [0.66666667 0.71929825 0.71052632 0.70175439 0.66949153 0.68376068 0.72222222 0.68907563 0.69230769 0.73214286] mean value: 0.6987246225144372 key: test_recall value: [0.72727273 0.72727273 0.72727273 0.63636364 0.75 0.75 0.83333333 0.75 0.72727273 0.63636364] mean value: 0.7265151515151516 key: train_recall value: [0.81553398 0.7961165 0.78640777 0.77669903 0.7745098 0.78431373 0.76470588 0.80392157 0.78640777 0.7961165 ] mean value: 0.7884732533790215 key: test_roc_auc value: [0.73863636 0.6969697 0.6969697 0.69318182 0.64772727 0.78409091 0.78030303 0.51136364 0.54545455 0.54545455] mean value: 0.6640151515151516 key: train_roc_auc value: [0.70188464 0.74119551 0.73143918 0.72168285 0.69793451 0.71254521 0.73672187 0.72234913 0.7184466 0.75242718] mean value: 0.7236626689510756 key: test_jcc value: [0.57142857 0.53333333 0.53333333 0.5 0.52941176 0.64285714 0.66666667 0.45 0.44444444 0.41176471] mean value: 0.5283239962651727 key: train_jcc value: [0.57931034 0.60740741 0.59558824 0.58394161 0.56028369 0.57553957 0.59090909 0.58992806 0.58273381 0.61654135] mean value: 0.5882183164453261 MCC on Blind test: 0.41 Accuracy on Blind test: 0.7 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01108789 0.0161562 0.01473117 0.01652384 0.01481438 0.015769 0.01692629 0.01509166 0.01592302 0.01968622] mean value: 0.015670967102050782 key: score_time value: [0.00860429 0.01099896 0.01096487 0.01157355 0.01150799 0.01149702 0.011621 0.01159811 0.01157045 0.01161623] mean value: 0.011155247688293457 key: test_mcc value: [0.62050523 0.66414149 0.48856385 0.69084928 0.83971912 0.63327851 0.74047959 0.56490196 0.48795004 0.40824829] mean value: 0.6138637350421967 key: train_mcc value: [0.64013725 0.94146202 0.961154 0.79610703 0.88361919 0.72360351 0.88909823 0.82136935 0.71743005 0.88083033] mean value: 0.8254810956558689 key: test_accuracy value: [0.7826087 0.82608696 0.73913043 0.82608696 0.91304348 0.7826087 0.86956522 0.7826087 0.72727273 0.68181818] mean value: 0.7930830039525691 key: train_accuracy value: [0.7902439 0.97073171 0.9804878 0.88780488 0.94146341 0.84390244 0.94146341 0.90731707 0.83980583 0.9368932 ] mean value: 0.9040113663272555 key: test_fscore value: [0.70588235 0.83333333 0.75 0.77777778 0.90909091 0.73684211 0.88 0.8 0.76923077 0.74074074] mean value: 0.7902897988377864 key: train_fscore value: [0.73619632 0.97087379 0.98076923 0.87431694 0.94230769 0.81395349 0.94444444 0.9124424 0.86192469 0.94063927] mean value: 0.8977868253122568 key: test_precision value: [1. 0.76923077 0.69230769 1. 1. 1. 0.84615385 0.76923077 0.66666667 0.625 ] mean value: 0.8368589743589744 key: train_precision value: [1. 0.97087379 0.97142857 1. 0.9245283 1. 0.89473684 0.86086957 0.75735294 0.88793103] mean value: 0.9267721042705015 key: test_recall value: [0.54545455 0.90909091 0.81818182 0.63636364 0.83333333 0.58333333 0.91666667 0.83333333 0.90909091 0.90909091] mean value: 0.7893939393939394 key: train_recall value: [0.58252427 0.97087379 0.99029126 0.77669903 0.96078431 0.68627451 1. 0.97058824 1. 1. ] mean value: 0.8938035408338092 key: test_roc_auc value: [0.77272727 0.82954545 0.74242424 0.81818182 0.91666667 0.79166667 0.86742424 0.78030303 0.72727273 0.68181818] mean value: 0.7928030303030303 key: train_roc_auc value: [0.79126214 0.97073101 0.98043975 0.88834951 0.94155721 0.84313725 0.94174757 0.90762421 0.83980583 0.9368932 ] mean value: 0.904154768703598 key: test_jcc value: [0.54545455 0.71428571 0.6 0.63636364 0.83333333 0.58333333 0.78571429 0.66666667 0.625 0.58823529] mean value: 0.6578386809269162 key: train_jcc value: [0.58252427 0.94339623 0.96226415 0.77669903 0.89090909 0.68627451 0.89473684 0.83898305 0.75735294 0.88793103] mean value: 0.8221071147654326 MCC on Blind test: 0.33 Accuracy on Blind test: 0.63 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01452732 0.01389766 0.01486945 0.0147233 0.01471424 0.01462889 0.0140667 0.01539159 0.01441121 0.01535964] mean value: 0.0146589994430542 key: score_time value: [0.01154995 0.0115397 0.01155138 0.01168466 0.01158977 0.01151705 0.01147985 0.01160812 0.01153541 0.01155424] mean value: 0.011561012268066407 key: test_mcc value: [0.40451992 0.66414149 0.33371191 0.76764947 0.6992059 0.74242424 0.65151515 0.83971912 0.68313005 0.40824829] mean value: 0.6194265542300403 key: train_mcc value: [0.4515346 0.84539215 0.89473501 0.72360351 0.73146795 0.88558308 0.7922197 0.91330072 0.81319759 0.69427256] mean value: 0.774530686441988 key: test_accuracy value: [0.65217391 0.82608696 0.65217391 0.86956522 0.82608696 0.86956522 0.82608696 0.91304348 0.81818182 0.68181818] mean value: 0.7934782608695652 key: train_accuracy value: [0.66829268 0.92195122 0.94634146 0.84390244 0.84878049 0.94146341 0.88780488 0.95609756 0.89805825 0.82524272] mean value: 0.873793511721525 key: test_fscore value: [0.42857143 0.83333333 0.69230769 0.88 0.8 0.86956522 0.83333333 0.90909091 0.84615385 0.58823529] mean value: 0.7680591054299494 key: train_fscore value: [0.50724638 0.92 0.94835681 0.86554622 0.82080925 0.93877551 0.87431694 0.9569378 0.90748899 0.78823529] mean value: 0.8527713181405282 key: test_precision value: [1. 0.76923077 0.6 0.78571429 1. 0.90909091 0.83333333 1. 0.73333333 0.83333333] mean value: 0.8464035964035964 key: train_precision value: [1. 0.94845361 0.91818182 0.76296296 1. 0.9787234 0.98765432 0.93457944 0.83064516 1. ] mean value: 0.9361200715177836 key: test_recall value: [0.27272727 0.90909091 0.81818182 1. 0.66666667 0.83333333 0.83333333 0.83333333 1. 0.45454545] mean value: 0.7621212121212121 key: train_recall value: [0.33980583 0.89320388 0.98058252 1. 0.69607843 0.90196078 0.78431373 0.98039216 1. 0.65048544] mean value: 0.8226822767942128 key: test_roc_auc value: [0.63636364 0.82954545 0.65909091 0.875 0.83333333 0.87121212 0.82575758 0.91666667 0.81818182 0.68181818] mean value: 0.7946969696969697 key: train_roc_auc value: [0.66990291 0.92209214 0.94617362 0.84313725 0.84803922 0.94127165 0.88730249 0.9562155 0.89805825 0.82524272] mean value: 0.8737435750999429 key: test_jcc value: [0.27272727 0.71428571 0.52941176 0.78571429 0.66666667 0.76923077 0.71428571 0.83333333 0.73333333 0.41666667] mean value: 0.6435655520949639 key: train_jcc value: [0.33980583 0.85185185 0.90178571 0.76296296 0.69607843 0.88461538 0.77669903 0.91743119 0.83064516 0.65048544] mean value: 0.7612360990301472 MCC on Blind test: 0.3 Accuracy on Blind test: 0.64 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.13866186 0.11934733 0.11773157 0.11565089 0.11581945 0.117486 0.11805749 0.12116289 0.11758924 0.11728621] mean value: 0.11987929344177246 key: score_time value: [0.0161581 0.01547503 0.01618123 0.01490808 0.01621389 0.01644397 0.01605368 0.01637292 0.01620722 0.01623797] mean value: 0.016025209426879884 key: test_mcc value: [0.74047959 0.82575758 0.74242424 0.91605722 0.83971912 0.83971912 1. 0.66414149 0.91287093 0.81818182] mean value: 0.8299351113957522 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 0.91304348 0.86956522 0.95652174 0.91304348 0.91304348 1. 0.82608696 0.95454545 0.90909091] mean value: 0.9124505928853754 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 0.90909091 0.86956522 0.95238095 0.90909091 0.90909091 1. 0.81818182 0.95238095 0.90909091] mean value: 0.9086015433841521 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.9 0.90909091 0.83333333 1. 1. 1. 1. 0.9 1. 0.90909091] mean value: 0.9451515151515152 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.90909091 0.90909091 0.90909091 0.83333333 0.83333333 1. 0.75 0.90909091 0.90909091] mean value: 0.878030303030303 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86742424 0.91287879 0.87121212 0.95454545 0.91666667 0.91666667 1. 0.82954545 0.95454545 0.90909091] mean value: 0.9132575757575758 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 0.83333333 0.76923077 0.90909091 0.83333333 0.83333333 1. 0.69230769 0.90909091 0.83333333] mean value: 0.8363053613053613 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.01 Accuracy on Blind test: 0.5 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.04288006 0.03579044 0.03872585 0.05015779 0.04801965 0.05010223 0.04300642 0.05224681 0.04316568 0.06040478] mean value: 0.04644997119903564 key: score_time value: [0.01788092 0.0261817 0.01968694 0.03280926 0.02911305 0.04006219 0.02331567 0.02224922 0.0289259 0.03248787] mean value: 0.027271270751953125 key: test_mcc value: [0.58002308 1. 0.65151515 1. 0.83971912 0.83971912 0.83971912 0.91666667 0.91287093 0.83205029] mean value: 0.8412283485098235 key: train_mcc value: [0.98067587 0.99029126 0.98067587 1. 0.99029034 1. 0.99029034 0.98067223 0.99033794 0.98076744] mean value: 0.9884001294873583 key: test_accuracy value: [0.7826087 1. 0.82608696 1. 0.91304348 0.91304348 0.91304348 0.95652174 0.95454545 0.90909091] mean value: 0.916798418972332 key: train_accuracy value: [0.9902439 0.99512195 0.9902439 1. 0.99512195 1. 0.99512195 0.9902439 0.99514563 0.99029126] mean value: 0.9941534454179494 key: test_fscore value: [0.73684211 1. 0.81818182 1. 0.90909091 0.90909091 0.90909091 0.95652174 0.95238095 0.9 ] mean value: 0.909119934222909 key: train_fscore value: [0.99019608 0.99512195 0.99019608 1. 0.99507389 1. 0.99507389 0.99009901 0.99512195 0.99019608] mean value: 0.9941078930885363 key: test_precision value: [0.875 1. 0.81818182 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9693181818181819 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.63636364 1. 0.81818182 1. 0.83333333 0.83333333 0.83333333 0.91666667 0.90909091 0.81818182] mean value: 0.8598484848484849 key: train_recall value: [0.98058252 0.99029126 0.98058252 1. 0.99019608 1. 0.99019608 0.98039216 0.99029126 0.98058252] mean value: 0.9883114410812869 key: test_roc_auc value: [0.77651515 1. 0.82575758 1. 0.91666667 0.91666667 0.91666667 0.95833333 0.95454545 0.90909091] mean value: 0.9174242424242425 key: train_roc_auc value: [0.99029126 0.99514563 0.99029126 1. 0.99509804 1. 0.99509804 0.99019608 0.99514563 0.99029126] mean value: 0.9941557205406435 key: test_jcc value: [0.58333333 1. 0.69230769 1. 0.83333333 0.83333333 0.83333333 0.91666667 0.90909091 0.81818182] mean value: 0.8419580419580419 key: train_jcc value: [0.98058252 0.99029126 0.98058252 1. 0.99019608 1. 0.99019608 0.98039216 0.99029126 0.98058252] mean value: 0.9883114410812869 MCC on Blind test: 0.05 Accuracy on Blind test: 0.52 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.02399015 0.02728724 0.03085113 0.06025457 0.07673955 0.06412292 0.06474566 0.06357622 0.0647521 0.06365848] mean value: 0.053997802734375 key: score_time value: [0.0126431 0.0125792 0.01255274 0.02063203 0.02428436 0.02345872 0.02092385 0.0228548 0.02434254 0.02310681] mean value: 0.019737815856933592 key: test_mcc value: [0.38932432 0.47727273 0.21452908 0.30240737 0.66414149 0.76764947 0.5164589 0.74047959 0.54232614 0.2773501 ] mean value: 0.489193919560136 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.69565217 0.73913043 0.60869565 0.65217391 0.82608696 0.86956522 0.73913043 0.86956522 0.72727273 0.63636364] mean value: 0.7363636363636363 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.72727273 0.52631579 0.6 0.81818182 0.85714286 0.7 0.88 0.625 0.6 ] mean value: 0.7000579858737753 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.7 0.72727273 0.625 0.66666667 0.9 1. 0.875 0.84615385 1. 0.66666667] mean value: 0.8006759906759907 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.63636364 0.72727273 0.45454545 0.54545455 0.75 0.75 0.58333333 0.91666667 0.45454545 0.54545455] mean value: 0.6363636363636364 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.69318182 0.73863636 0.60227273 0.64772727 0.82954545 0.875 0.74621212 0.86742424 0.72727273 0.63636364] mean value: 0.7363636363636363 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.57142857 0.35714286 0.42857143 0.69230769 0.75 0.53846154 0.78571429 0.45454545 0.42857143] mean value: 0.5506743256743256 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.22 Accuracy on Blind test: 0.61 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.36353517 0.35397005 0.34888005 0.34964776 0.35780954 0.35491943 0.35081267 0.35545444 0.35240722 0.34948468] mean value: 0.35369210243225097 key: score_time value: [0.00919867 0.0091424 0.00909376 0.00912738 0.00930524 0.00908327 0.0090704 0.00978851 0.00921178 0.00905943] mean value: 0.009208083152770996 key: test_mcc value: [0.74047959 0.91666667 0.74242424 1. 0.76764947 1. 1. 0.91666667 0.91287093 1. ] mean value: 0.8996757568581448 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 0.95652174 0.86956522 1. 0.86956522 1. 1. 0.95652174 0.95454545 1. ] mean value: 0.9476284584980237 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 0.95652174 0.86956522 1. 0.85714286 1. 1. 0.95652174 0.95652174 1. ] mean value: 0.9453416149068323 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.9 0.91666667 0.83333333 1. 1. 1. 1. 1. 0.91666667 1. ] mean value: 0.9566666666666667 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 1. 0.90909091 1. 0.75 1. 1. 0.91666667 1. 1. ] mean value: 0.9393939393939394 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86742424 0.95833333 0.87121212 1. 0.875 1. 1. 0.95833333 0.95454545 1. ] mean value: 0.9484848484848485 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 0.91666667 0.76923077 1. 0.75 1. 1. 0.91666667 0.91666667 1. ] mean value: 0.9019230769230769 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.13 Accuracy on Blind test: 0.54 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.02362514 0.0202353 0.0199132 0.02025294 0.0201211 0.02020693 0.02033591 0.02034235 0.01992464 0.02041364] mean value: 0.02053711414337158 key: score_time value: [0.01706982 0.01200247 0.01434779 0.01835537 0.017483 0.02034879 0.0230217 0.01707029 0.01792383 0.01838613] mean value: 0.01760091781616211 key: test_mcc value: [0.56879646 0.6992059 0.37080992 0.50460839 0.76277007 0.76277007 0.69084928 0.83743579 0.64715023 0.75592895] mean value: 0.6600325061286204 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73913043 0.82608696 0.60869565 0.69565217 0.86956522 0.86956522 0.82608696 0.91304348 0.81818182 0.86363636] mean value: 0.8029644268774704 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.78571429 0.84615385 0.70967742 0.75862069 0.88888889 0.88888889 0.85714286 0.92307692 0.83333333 0.88 ] mean value: 0.8371497132209035 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.64705882 0.73333333 0.55 0.61111111 0.8 0.8 0.75 0.85714286 0.76923077 0.78571429] mean value: 0.7303591180061768 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 0.90909091 1. ] mean value: 0.990909090909091 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.75 0.83333333 0.625 0.70833333 0.86363636 0.86363636 0.81818182 0.90909091 0.81818182 0.86363636] mean value: 0.8053030303030303 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.64705882 0.73333333 0.55 0.61111111 0.8 0.8 0.75 0.85714286 0.71428571 0.78571429] mean value: 0.7248646125116713 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.18 Accuracy on Blind test: 0.54 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02244925 0.04062223 0.03517199 0.03477359 0.05889964 0.01386142 0.01386476 0.01378965 0.0137496 0.03409195] mean value: 0.028127408027648924 key: score_time value: [0.02053857 0.02378178 0.02394128 0.02323008 0.01209402 0.01185203 0.01175451 0.01184201 0.01170516 0.02339268] mean value: 0.01741321086883545 key: test_mcc value: [0.82575758 0.66414149 0.47727273 0.65151515 0.76764947 0.91666667 0.74047959 0.82575758 0.81818182 0.36514837] mean value: 0.7052570438471021 key: train_mcc value: [0.90310636 0.90310636 0.91325992 0.92194936 0.93211467 0.92213232 0.93211467 0.86409538 0.91266437 0.92389898] mean value: 0.9128442392047932 key: test_accuracy value: [0.91304348 0.82608696 0.73913043 0.82608696 0.86956522 0.95652174 0.86956522 0.91304348 0.90909091 0.68181818] mean value: 0.850395256916996 key: train_accuracy value: [0.95121951 0.95121951 0.95609756 0.96097561 0.96585366 0.96097561 0.96585366 0.93170732 0.95631068 0.96116505] mean value: 0.956137816717973 key: test_fscore value: [0.90909091 0.83333333 0.72727273 0.81818182 0.85714286 0.95652174 0.88 0.91666667 0.90909091 0.66666667] mean value: 0.8473967626576322 key: train_fscore value: [0.95238095 0.95238095 0.95734597 0.96116505 0.96618357 0.96116505 0.96618357 0.93269231 0.95652174 0.96226415] mean value: 0.9568283320937857 key: test_precision value: [0.90909091 0.76923077 0.72727273 0.81818182 1. 1. 0.84615385 0.91666667 0.90909091 0.7 ] mean value: 0.8595687645687645 key: train_precision value: [0.93457944 0.93457944 0.93518519 0.96116505 0.95238095 0.95192308 0.95238095 0.91509434 0.95192308 0.93577982] mean value: 0.942499132697801 key: test_recall value: [0.90909091 0.90909091 0.72727273 0.81818182 0.75 0.91666667 0.91666667 0.91666667 0.90909091 0.63636364] mean value: 0.8409090909090909 key: train_recall value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:155: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:158: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.97087379 0.97087379 0.98058252 0.96116505 0.98039216 0.97058824 0.98039216 0.95098039 0.96116505 0.99029126] mean value: 0.971730439748715 key: test_roc_auc value: [0.91287879 0.82954545 0.73863636 0.82575758 0.875 0.95833333 0.86742424 0.91287879 0.90909091 0.68181818] mean value: 0.8511363636363636 key: train_roc_auc value: [0.95112317 0.95112317 0.95597754 0.96097468 0.96592423 0.96102227 0.96592423 0.93180088 0.95631068 0.96116505] mean value: 0.9561345897582334 key: test_jcc value: [0.83333333 0.71428571 0.57142857 0.69230769 0.75 0.91666667 0.78571429 0.84615385 0.83333333 0.5 ] mean value: 0.7443223443223443 key: train_jcc value: [0.90909091 0.90909091 0.91818182 0.92523364 0.93457944 0.92523364 0.93457944 0.87387387 0.91666667 0.92727273] mean value: 0.9173803072401203 MCC on Blind test: 0.21 Accuracy on Blind test: 0.6 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.23017311 0.2311604 0.22449851 0.22934937 0.22811341 0.22492957 0.3225956 0.27365017 0.24024272 0.22975492] mean value: 0.24344677925109864 key: score_time value: [0.02268362 0.0237155 0.02395248 0.02225494 0.02246737 0.02187586 0.02325249 0.02057958 0.02395296 0.02296972] mean value: 0.022770452499389648 key: test_mcc value: [0.76277007 0.56818182 0.47727273 0.65151515 0.76764947 0.82575758 0.74047959 0.82575758 0.83205029 0.36514837] mean value: 0.6816582649537872 key: train_mcc value: [0.91223227 0.92211753 0.91325992 0.92194936 0.93211467 0.94164684 0.93211467 0.86409538 0.92250402 0.92389898] mean value: 0.9185933650622469 key: test_accuracy value: [0.86956522 0.7826087 0.73913043 0.82608696 0.86956522 0.91304348 0.86956522 0.91304348 0.90909091 0.68181818] mean value: 0.8373517786561264 key: train_accuracy value: [0.95609756 0.96097561 0.95609756 0.96097561 0.96585366 0.97073171 0.96585366 0.93170732 0.96116505 0.96116505] mean value: 0.9590622780014207 key: test_fscore value: [0.84210526 0.7826087 0.72727273 0.81818182 0.85714286 0.91666667 0.88 0.91666667 0.9 0.66666667] mean value: 0.8307311361407471 key: train_fscore value: [0.95652174 0.96153846 0.95734597 0.96116505 0.96618357 0.97087379 0.96618357 0.93269231 0.96153846 0.96226415] mean value: 0.9596307077116953 key: test_precision value: [1. 0.75 0.72727273 0.81818182 1. 0.91666667 0.84615385 0.91666667 1. 0.7 ] mean value: 0.8674941724941725 key: train_precision value: [0.95192308 0.95238095 0.93518519 0.96116505 0.95238095 0.96153846 0.95238095 0.91509434 0.95238095 0.93577982] mean value: 0.9470209737850626 key: test_recall value: [0.72727273 0.81818182 0.72727273 0.81818182 0.75 0.91666667 0.91666667 0.91666667 0.81818182 0.63636364] mean value: 0.8045454545454546 key: train_recall value: [0.96116505 0.97087379 0.98058252 0.96116505 0.98039216 0.98039216 0.98039216 0.95098039 0.97087379 0.99029126] mean value: 0.9727108319055777 key: test_roc_auc value: [0.86363636 0.78409091 0.73863636 0.82575758 0.875 0.91287879 0.86742424 0.91287879 0.90909091 0.68181818] mean value: 0.8371212121212122 key: train_roc_auc value: [0.95607272 0.96092709 0.95597754 0.96097468 0.96592423 0.9707786 0.96592423 0.93180088 0.96116505 0.96116505] mean value: 0.9590710070435942 key: test_jcc value: [0.72727273 0.64285714 0.57142857 0.69230769 0.75 0.84615385 0.78571429 0.84615385 0.81818182 0.5 ] mean value: 0.718006993006993 key: train_jcc value: [0.91666667 0.92592593 0.91818182 0.92523364 0.93457944 0.94339623 0.93457944 0.87387387 0.92592593 0.92727273] mean value: 0.9225635687626518 MCC on Blind test: 0.21 Accuracy on Blind test: 0.6 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.02333498 0.02825093 0.02866554 0.02478814 0.03009248 0.02759218 0.02531862 0.02699327 0.02669168 0.0287993 ] mean value: 0.027052712440490723 key: score_time value: [0.00992012 0.01171041 0.01177144 0.01170754 0.01182556 0.01178455 0.01177168 0.01180267 0.01176238 0.01171088] mean value: 0.0115767240524292 key: test_mcc value: [ 0.37796447 0.49099025 0.74535599 0.57735027 0.28867513 0.42857143 0.8660254 0.17407766 -0.31622777 0.28867513] mean value: 0.39214579792141185 key: train_mcc value: [0.90550595 0.81271824 0.78163175 0.81289702 0.83066386 0.86200967 0.79775192 0.85947992 0.8603207 0.875 ] mean value: 0.8397979034970979 key: test_accuracy value: [0.66666667 0.73333333 0.85714286 0.78571429 0.64285714 0.71428571 0.92857143 0.57142857 0.35714286 0.64285714] mean value: 0.69 key: train_accuracy value: [0.95275591 0.90551181 0.890625 0.90625 0.9140625 0.9296875 0.8984375 0.9296875 0.9296875 0.9375 ] mean value: 0.9194205216535433 key: test_fscore value: [0.70588235 0.71428571 0.875 0.76923077 0.61538462 0.71428571 0.92307692 0.66666667 0.18181818 0.61538462] mean value: 0.6781015553074377 key: train_fscore value: [0.953125 0.90769231 0.89230769 0.90769231 0.91729323 0.93233083 0.896 0.93023256 0.93129771 0.9375 ] mean value: 0.9205471635905882 key: test_precision value: [0.6 0.83333333 0.77777778 0.83333333 0.66666667 0.71428571 1. 0.54545455 0.25 0.66666667] mean value: 0.6887518037518038 key: train_precision value: [0.953125 0.88059701 0.87878788 0.89393939 0.88405797 0.89855072 0.91803279 0.92307692 0.91044776 0.9375 ] mean value: 0.9078115454461019 key: test_recall value: [0.85714286 0.625 1. 0.71428571 0.57142857 0.71428571 0.85714286 0.85714286 0.14285714 0.57142857] mean value: 0.6910714285714286 key: train_recall value: [0.953125 0.93650794 0.90625 0.921875 0.953125 0.96875 0.875 0.9375 0.953125 0.9375 ] mean value: 0.9342757936507936 key: test_roc_auc value: [0.67857143 0.74107143 0.85714286 0.78571429 0.64285714 0.71428571 0.92857143 0.57142857 0.35714286 0.64285714] mean value: 0.6919642857142857 key: train_roc_auc value: [0.95275298 0.90575397 0.890625 0.90625 0.9140625 0.9296875 0.8984375 0.9296875 0.9296875 0.9375 ] mean value: 0.9194444444444444 key: test_jcc value: [0.54545455 0.55555556 0.77777778 0.625 0.44444444 0.55555556 0.85714286 0.5 0.1 0.44444444] mean value: 0.540537518037518 key: train_jcc value: [0.91044776 0.83098592 0.80555556 0.83098592 0.84722222 0.87323944 0.8115942 0.86956522 0.87142857 0.88235294] mean value: 0.8533377739472339 MCC on Blind test: 0.39 Accuracy on Blind test: 0.69 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.85289168 0.70059133 0.73125625 0.85352159 0.73122883 0.70029855 0.80738592 0.64156055 0.63386559 0.79457211] mean value: 0.7447172403335571 key: score_time value: [0.01466966 0.01212597 0.01516724 0.01518154 0.01209664 0.01497483 0.01526499 0.01661897 0.01539016 0.01516986] mean value: 0.014665985107421875 key: test_mcc value: [ 0.21821789 0.33928571 0.57735027 0.8660254 0.42857143 0.57735027 0.74535599 0.42857143 -0.14285714 0.1490712 ] mean value: 0.4186942451971027 key: train_mcc value: [1. 0.93748452 0.90669283 1. 0.89073374 1. 1. 1. 1. 1. ] mean value: 0.9734911092040202 key: test_accuracy value: [0.6 0.66666667 0.78571429 0.92857143 0.71428571 0.78571429 0.85714286 0.71428571 0.42857143 0.57142857] mean value: 0.7052380952380952 key: train_accuracy value: [1. 0.96850394 0.953125 1. 0.9453125 1. 1. 1. 1. 1. ] mean value: 0.9866941437007875 key: test_fscore value: [0.625 0.66666667 0.8 0.92307692 0.71428571 0.76923077 0.83333333 0.71428571 0.42857143 0.5 ] mean value: 0.6974450549450549 key: train_fscore value: [1. 0.96875 0.95384615 1. 0.94573643 1. 1. 1. 1. 1. ] mean value: 0.9868332587954681 key: test_precision value: [0.55555556 0.71428571 0.75 1. 0.71428571 0.83333333 1. 0.71428571 0.42857143 0.6 ] mean value: 0.731031746031746 key: train_precision value: [1. 0.95384615 0.93939394 1. 0.93846154 1. 1. 1. 1. 1. ] mean value: 0.9831701631701631 key: test_recall value: [0.71428571 0.625 0.85714286 0.85714286 0.71428571 0.71428571 0.71428571 0.71428571 0.42857143 0.42857143] mean value: 0.6767857142857143 key: train_recall value: [1. 0.98412698 0.96875 1. 0.953125 1. 1. 1. 1. 1. ] mean value: 0.9906001984126984 key: test_roc_auc value: [0.60714286 0.66964286 0.78571429 0.92857143 0.71428571 0.78571429 0.85714286 0.71428571 0.42857143 0.57142857] mean value: 0.70625 key: train_roc_auc value: [1. 0.96862599 0.953125 1. 0.9453125 1. 1. 1. 1. 1. ] mean value: 0.9867063492063493 key: test_jcc value: [0.45454545 0.5 0.66666667 0.85714286 0.55555556 0.625 0.71428571 0.55555556 0.27272727 0.33333333] mean value: 0.553481240981241 key: train_jcc value: [1. 0.93939394 0.91176471 1. 0.89705882 1. 1. 1. 1. 1. ] mean value: 0.9748217468805704 MCC on Blind test: 0.26 Accuracy on Blind test: 0.63 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01235104 0.01005244 0.01008081 0.00952315 0.00941539 0.00865269 0.00883651 0.00973344 0.00860357 0.00904226] mean value: 0.009629130363464355 key: score_time value: [0.01814413 0.00991583 0.00927973 0.00927973 0.0089798 0.00896645 0.00927234 0.00917888 0.00874782 0.00863695] mean value: 0.01004016399383545 key: test_mcc value: [ 0.26189246 0.18898224 0.17407766 0.40824829 0.17407766 0.31622777 0.1490712 0. -0.2773501 0.31622777] mean value: 0.17114549346091681 key: train_mcc value: [0.41221894 0.3438986 0.41858962 0.40451992 0.43084241 0.4031367 0.35377457 0.44649977 0.39637502 0.36808134] mean value: 0.3977936882004 key: test_accuracy value: [0.6 0.6 0.57142857 0.64285714 0.57142857 0.64285714 0.57142857 0.5 0.42857143 0.64285714] mean value: 0.5771428571428572 key: train_accuracy value: [0.67716535 0.62992126 0.6953125 0.640625 0.6953125 0.6796875 0.65625 0.6953125 0.6796875 0.6640625 ] mean value: 0.6713336614173229 key: test_fscore value: [0.66666667 0.66666667 0.66666667 0.73684211 0.66666667 0.70588235 0.625 0.63157895 0.6 0.70588235] mean value: 0.6671852425180598 key: train_fscore value: [0.74534161 0.71856287 0.74172185 0.73563218 0.7483871 0.7388535 0.72151899 0.75471698 0.73548387 0.72611465] mean value: 0.7366333616453036 key: test_precision value: [0.54545455 0.6 0.54545455 0.58333333 0.54545455 0.6 0.55555556 0.5 0.46153846 0.6 ] mean value: 0.5536790986790987 key: train_precision value: [0.6185567 0.57692308 0.64367816 0.58181818 0.63736264 0.62365591 0.60638298 0.63157895 0.62637363 0.61290323] mean value: 0.6159233450304762 key: test_recall value: [0.85714286 0.75 0.85714286 1. 0.85714286 0.85714286 0.71428571 0.85714286 0.85714286 0.85714286] mean value: 0.8464285714285714 key: train_recall value: [0.9375 0.95238095 0.875 1. 0.90625 0.90625 0.890625 0.9375 0.890625 0.890625 ] mean value: 0.9186755952380953 key: test_roc_auc value: [0.61607143 0.58928571 0.57142857 0.64285714 0.57142857 0.64285714 0.57142857 0.5 0.42857143 0.64285714] mean value: 0.5776785714285715 key: train_roc_auc value: [0.67509921 0.63244048 0.6953125 0.640625 0.6953125 0.6796875 0.65625 0.6953125 0.6796875 0.6640625 ] mean value: 0.6713789682539683 key: test_jcc value: [0.5 0.5 0.5 0.58333333 0.5 0.54545455 0.45454545 0.46153846 0.42857143 0.54545455] mean value: 0.5018897768897769 key: train_jcc value: [0.59405941 0.56074766 0.58947368 0.58181818 0.59793814 0.58585859 0.56435644 0.60606061 0.58163265 0.57 ] mean value: 0.5831945360474582 MCC on Blind test: 0.43 Accuracy on Blind test: 0.69 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00890517 0.00869513 0.00965166 0.00883913 0.00877833 0.00885248 0.00935841 0.00872326 0.00894189 0.00988293] mean value: 0.009062838554382325 key: score_time value: [0.00861073 0.00870466 0.00879693 0.0087924 0.00871682 0.00864601 0.0091064 0.00876236 0.0087378 0.00916338] mean value: 0.008803749084472656 key: test_mcc value: [ 0.18898224 0.49099025 0.4472136 -0.31622777 0.14285714 0.42857143 0. -0.1490712 -0.63245553 0.1490712 ] mean value: 0.0749931358413612 key: train_mcc value: [0.48209995 0.40158859 0.438357 0.42233925 0.42610928 0.40946151 0.43943537 0.50024432 0.53229065 0.438357 ] mean value: 0.44902829325504817 key: test_accuracy value: [0.6 0.73333333 0.71428571 0.35714286 0.57142857 0.71428571 0.5 0.42857143 0.21428571 0.57142857] mean value: 0.5404761904761904 key: train_accuracy value: [0.74015748 0.7007874 0.71875 0.7109375 0.7109375 0.703125 0.71875 0.75 0.765625 0.71875 ] mean value: 0.7237819881889764 key: test_fscore value: [0.5 0.71428571 0.75 0.18181818 0.57142857 0.71428571 0.36363636 0.5 0. 0.5 ] mean value: 0.47954545454545455 key: train_fscore value: [0.73170732 0.69354839 0.70967742 0.704 0.68907563 0.68333333 0.70491803 0.75384615 0.75806452 0.70967742] mean value: 0.7137848209227128 key: test_precision value: [0.6 0.83333333 0.66666667 0.25 0.57142857 0.71428571 0.5 0.44444444 0. 0.6 ] mean value: 0.518015873015873 key: train_precision value: [0.76271186 0.70491803 0.73333333 0.72131148 0.74545455 0.73214286 0.74137931 0.74242424 0.78333333 0.73333333] mean value: 0.7400342327969973 key: test_recall value: [0.42857143 0.625 0.85714286 0.14285714 0.57142857 0.71428571 0.28571429 0.57142857 0. 0.42857143] mean value: 0.46249999999999997 key: train_recall value: [0.703125 0.68253968 0.6875 0.6875 0.640625 0.640625 0.671875 0.765625 0.734375 0.6875 ] mean value: 0.6901289682539683 key: test_roc_auc value: [0.58928571 0.74107143 0.71428571 0.35714286 0.57142857 0.71428571 0.5 0.42857143 0.21428571 0.57142857] mean value: 0.5401785714285714 key: train_roc_auc value: [0.74045139 0.70064484 0.71875 0.7109375 0.7109375 0.703125 0.71875 0.75 0.765625 0.71875 ] mean value: 0.7237971230158731 key: test_jcc value: [0.33333333 0.55555556 0.6 0.1 0.4 0.55555556 0.22222222 0.33333333 0. 0.33333333] mean value: 0.3433333333333333 key: train_jcc value: [0.57692308 0.5308642 0.55 0.54320988 0.52564103 0.51898734 0.5443038 0.60493827 0.61038961 0.55 ] mean value: 0.5555257197873231 MCC on Blind test: 0.27 Accuracy on Blind test: 0.63 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00843072 0.00828362 0.0094943 0.00941944 0.00952077 0.00883842 0.00867176 0.00860167 0.00916886 0.00823045] mean value: 0.008866000175476074 key: score_time value: [0.00952911 0.00987101 0.01023984 0.0103004 0.01070189 0.01650286 0.01474071 0.01097417 0.00975394 0.00930381] mean value: 0.011191773414611816 key: test_mcc value: [ 0.47245559 -0.37796447 -0.28867513 -0.1490712 0.14285714 0. 0. -0.28867513 -0.57735027 -0.28867513] mean value: -0.13550986103646007 key: train_mcc value: [0.41894709 0.29176205 0.36047677 0.29866683 0.39298268 0.438357 0.37518324 0.375 0.438357 0.3480246 ] mean value: 0.3737757275384978 key: test_accuracy value: [0.73333333 0.33333333 0.35714286 0.42857143 0.57142857 0.5 0.5 0.35714286 0.21428571 0.35714286] mean value: 0.4352380952380952 key: train_accuracy value: [0.70866142 0.64566929 0.6796875 0.6484375 0.6953125 0.71875 0.6875 0.6875 0.71875 0.671875 ] mean value: 0.6862143208661418 key: test_fscore value: [0.66666667 0.44444444 0.4 0.5 0.57142857 0.46153846 0.46153846 0.30769231 0.26666667 0.30769231] mean value: 0.43876678876678876 key: train_fscore value: [0.69918699 0.62809917 0.66666667 0.62809917 0.67768595 0.70967742 0.69230769 0.6875 0.70967742 0.6440678 ] mean value: 0.6742968283684786 key: test_precision value: [0.8 0.4 0.375 0.44444444 0.57142857 0.5 0.5 0.33333333 0.25 0.33333333] mean value: 0.45075396825396824 key: train_precision value: [0.72881356 0.65517241 0.69491525 0.66666667 0.71929825 0.73333333 0.68181818 0.6875 0.73333333 0.7037037 ] mean value: 0.700455469182168 key: test_recall value: [0.57142857 0.5 0.42857143 0.57142857 0.57142857 0.42857143 0.42857143 0.28571429 0.28571429 0.28571429] mean value: 0.4357142857142857 key: train_recall value: [0.671875 0.6031746 0.640625 0.59375 0.640625 0.6875 0.703125 0.6875 0.6875 0.59375 ] mean value: 0.6509424603174603 key: test_roc_auc value: [0.72321429 0.32142857 0.35714286 0.42857143 0.57142857 0.5 0.5 0.35714286 0.21428571 0.35714286] mean value: 0.4330357142857143 key: train_roc_auc value: [0.70895337 0.6453373 0.6796875 0.6484375 0.6953125 0.71875 0.6875 0.6875 0.71875 0.671875 ] mean value: 0.6862103174603175 key: test_jcc value: [0.5 0.28571429 0.25 0.33333333 0.4 0.3 0.3 0.18181818 0.15384615 0.18181818] mean value: 0.28865301365301366 key: train_jcc value: [0.5375 0.45783133 0.5 0.45783133 0.5125 0.55 0.52941176 0.52380952 0.55 0.475 ] mean value: 0.5093883939117816 MCC on Blind test: 0.17 Accuracy on Blind test: 0.58 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01124406 0.01096201 0.01099586 0.00968695 0.00980282 0.00962329 0.00957036 0.01035261 0.00969028 0.00992322] mean value: 0.010185146331787109 key: score_time value: [0.00972486 0.00949264 0.00957704 0.00875735 0.00879574 0.00882769 0.0090363 0.00884104 0.00873518 0.00879645] mean value: 0.009058427810668946 key: test_mcc value: [ 0.33928571 0.09449112 0.4472136 0.71428571 0.14285714 0.42857143 0.31622777 0. -0.31622777 0. ] mean value: 0.21667047137522646 key: train_mcc value: [0.63789683 0.6852819 0.64070322 0.71910121 0.6253054 0.67195703 0.6253054 0.78125 0.72015793 0.59491308] mean value: 0.6701871989258454 key: test_accuracy value: [0.66666667 0.53333333 0.71428571 0.85714286 0.57142857 0.71428571 0.64285714 0.5 0.35714286 0.5 ] mean value: 0.6057142857142858 key: train_accuracy value: [0.81889764 0.84251969 0.8203125 0.859375 0.8125 0.8359375 0.8125 0.890625 0.859375 0.796875 ] mean value: 0.8348917322834646 key: test_fscore value: [0.66666667 0.46153846 0.75 0.85714286 0.57142857 0.71428571 0.54545455 0.53333333 0.18181818 0.46153846] mean value: 0.5743206793206793 key: train_fscore value: [0.81889764 0.83870968 0.81889764 0.86153846 0.81538462 0.83464567 0.81538462 0.890625 0.86363636 0.79032258] mean value: 0.8348042258890461 key: test_precision value: [0.625 0.6 0.66666667 0.85714286 0.57142857 0.71428571 0.75 0.5 0.25 0.5 ] mean value: 0.603452380952381 key: train_precision value: [0.82539683 0.85245902 0.82539683 0.84848485 0.8030303 0.84126984 0.8030303 0.890625 0.83823529 0.81666667] mean value: 0.8344594923786702 key: test_recall value: [0.71428571 0.375 0.85714286 0.85714286 0.57142857 0.71428571 0.42857143 0.57142857 0.14285714 0.42857143] mean value: 0.5660714285714286 key: train_recall value: [0.8125 0.82539683 0.8125 0.875 0.828125 0.828125 0.828125 0.890625 0.890625 0.765625 ] mean value: 0.8356646825396825 key: test_roc_auc value: [0.66964286 0.54464286 0.71428571 0.85714286 0.57142857 0.71428571 0.64285714 0.5 0.35714286 0.5 ] mean value: 0.6071428571428572 key: train_roc_auc value: [0.81894841 0.84238591 0.8203125 0.859375 0.8125 0.8359375 0.8125 0.890625 0.859375 0.796875 ] mean value: 0.8348834325396826 key: test_jcc value: [0.5 0.3 0.6 0.75 0.4 0.55555556 0.375 0.36363636 0.1 0.3 ] mean value: 0.4244191919191919 key: train_jcc value: [0.69333333 0.72222222 0.69333333 0.75675676 0.68831169 0.71621622 0.68831169 0.8028169 0.76 0.65333333] mean value: 0.7174635473227022 MCC on Blind test: 0.42 Accuracy on Blind test: 0.71 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.72048855 0.58740497 0.71089745 0.57400823 0.57025862 0.57146072 0.68641615 0.55554175 0.5684855 0.64807653] mean value: 0.619303846359253 key: score_time value: [0.01467466 0.01231575 0.01452303 0.01452589 0.01463056 0.01481271 0.01819897 0.01205873 0.01488829 0.01502252] mean value: 0.014565110206604004 key: test_mcc value: [ 0.33928571 0.21821789 0.1490712 0.57735027 0.14285714 0.57735027 0.8660254 0. -0.28867513 0.28867513] mean value: 0.28701578880425255 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.66666667 0.6 0.57142857 0.78571429 0.57142857 0.78571429 0.92857143 0.5 0.35714286 0.64285714] mean value: 0.6409523809523809 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.57142857 0.625 0.8 0.57142857 0.8 0.93333333 0.53333333 0.30769231 0.61538462] mean value: 0.6424267399267399 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.625 0.66666667 0.55555556 0.75 0.57142857 0.75 0.875 0.5 0.33333333 0.66666667] mean value: 0.6293650793650793 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.71428571 0.5 0.71428571 0.85714286 0.57142857 0.85714286 1. 0.57142857 0.28571429 0.57142857] mean value: 0.6642857142857143 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.66964286 0.60714286 0.57142857 0.78571429 0.57142857 0.78571429 0.92857143 0.5 0.35714286 0.64285714] mean value: 0.6419642857142858 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.4 0.45454545 0.66666667 0.4 0.66666667 0.875 0.36363636 0.18181818 0.44444444] mean value: 0.49527777777777776 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.29 Accuracy on Blind test: 0.64 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02736211 0.01128983 0.01124406 0.01253748 0.01097155 0.01206899 0.01089215 0.01065326 0.01102686 0.01143122] mean value: 0.012947750091552735 key: score_time value: [0.01159692 0.00895143 0.00873518 0.00955987 0.00854182 0.00929856 0.00847149 0.00848365 0.00856256 0.00918412] mean value: 0.00913856029510498 key: test_mcc value: [0.33928571 0.875 1. 0.8660254 0.57735027 0.8660254 0.63245553 0. 1. 0.28867513] mean value: 0.6444817457672706 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.66666667 0.93333333 1. 0.92857143 0.78571429 0.92857143 0.78571429 0.5 1. 0.64285714] mean value: 0.8171428571428572 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.93333333 1. 0.92307692 0.76923077 0.93333333 0.72727273 0.53333333 1. 0.61538462] mean value: 0.8101631701631702 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.625 1. 1. 1. 0.83333333 0.875 1. 0.5 1. 0.66666667] mean value: 0.85 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.71428571 0.875 1. 0.85714286 0.71428571 1. 0.57142857 0.57142857 1. 0.57142857] mean value: 0.7875 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.66964286 0.9375 1. 0.92857143 0.78571429 0.92857143 0.78571429 0.5 1. 0.64285714] mean value: 0.8178571428571428 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.875 1. 0.85714286 0.625 0.875 0.57142857 0.36363636 1. 0.44444444] mean value: 0.7111652236652236 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.06 Accuracy on Blind test: 0.52 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.08901286 0.09013486 0.08958101 0.08881617 0.08836842 0.08858609 0.08868885 0.08849192 0.08824515 0.08973527] mean value: 0.08896605968475342 key: score_time value: [0.01702809 0.01857686 0.01709199 0.01713276 0.01712823 0.01718926 0.01714444 0.01747155 0.01718402 0.01791286] mean value: 0.01738600730895996 key: test_mcc value: [ 0.19642857 0.07142857 0.74535599 0.57735027 0.42857143 0.57735027 0.4472136 -0.28867513 -0.4472136 0.1490712 ] mean value: 0.24568811662129258 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.6 0.53333333 0.85714286 0.78571429 0.71428571 0.78571429 0.71428571 0.35714286 0.28571429 0.57142857] mean value: 0.6204761904761905 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.57142857 0.53333333 0.83333333 0.8 0.71428571 0.8 0.66666667 0.4 0.16666667 0.5 ] mean value: 0.5985714285714285 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.57142857 0.57142857 1. 0.75 0.71428571 0.75 0.8 0.375 0.2 0.6 ] mean value: 0.6332142857142857 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.57142857 0.5 0.71428571 0.85714286 0.71428571 0.85714286 0.57142857 0.42857143 0.14285714 0.42857143] mean value: 0.5785714285714285 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.59821429 0.53571429 0.85714286 0.78571429 0.71428571 0.78571429 0.71428571 0.35714286 0.28571429 0.57142857] mean value: 0.6205357142857143 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.4 0.36363636 0.71428571 0.66666667 0.55555556 0.66666667 0.5 0.25 0.09090909 0.33333333] mean value: 0.4541053391053391 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.31 Accuracy on Blind test: 0.65 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.0096097 0.0091033 0.00874829 0.00875807 0.008811 0.00929141 0.00984359 0.00904584 0.00888586 0.00879669] mean value: 0.009089374542236328 key: score_time value: [0.00904441 0.00886655 0.00872993 0.00869799 0.0087254 0.0087328 0.00910592 0.00863886 0.00867748 0.00856519] mean value: 0.00877845287322998 key: test_mcc value: [ 0.33928571 0.07142857 0.57735027 0.42857143 0.57735027 0.1490712 0. -0.14285714 -0.42857143 0.1490712 ] mean value: 0.17207000782363663 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.66666667 0.53333333 0.78571429 0.71428571 0.78571429 0.57142857 0.5 0.42857143 0.28571429 0.57142857] mean value: 0.5842857142857143 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.53333333 0.8 0.71428571 0.8 0.625 0.36363636 0.42857143 0.28571429 0.5 ] mean value: 0.5717207792207792 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.625 0.57142857 0.75 0.71428571 0.75 0.55555556 0.5 0.42857143 0.28571429 0.6 ] mean value: 0.5780555555555555 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.71428571 0.5 0.85714286 0.71428571 0.85714286 0.71428571 0.28571429 0.42857143 0.28571429 0.42857143] mean value: 0.5785714285714285 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.66964286 0.53571429 0.78571429 0.71428571 0.78571429 0.57142857 0.5 0.42857143 0.28571429 0.57142857] mean value: 0.5848214285714286 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.36363636 0.66666667 0.55555556 0.66666667 0.45454545 0.22222222 0.27272727 0.16666667 0.33333333] mean value: 0.4202020202020202 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.55 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.13835144 1.1283567 1.13199878 1.13944674 1.13446093 1.13452864 1.12859964 1.1291821 1.12874389 1.12446833] mean value: 1.1318137168884277 key: score_time value: [0.08793807 0.08876872 0.09104156 0.08774018 0.08761907 0.08778667 0.14704132 0.09132361 0.09439731 0.09718728] mean value: 0.0960843801498413 key: test_mcc value: [0.37796447 0.76376262 0.8660254 0.8660254 0.8660254 0.74535599 0.74535599 0. 0.4472136 0.42857143] mean value: 0.6106300309259763 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.66666667 0.86666667 0.92857143 0.92857143 0.92857143 0.85714286 0.85714286 0.5 0.71428571 0.71428571] mean value: 0.7961904761904762 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.70588235 0.85714286 0.92307692 0.92307692 0.92307692 0.875 0.83333333 0.53333333 0.66666667 0.71428571] mean value: 0.795487502693385 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.6 1. 1. 1. 1. 0.77777778 1. 0.5 0.8 0.71428571] mean value: 0.8392063492063492 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.85714286 0.75 0.85714286 0.85714286 0.85714286 1. 0.71428571 0.57142857 0.57142857 0.71428571] mean value: 0.775 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.67857143 0.875 0.92857143 0.92857143 0.92857143 0.85714286 0.85714286 0.5 0.71428571 0.71428571] mean value: 0.7982142857142858 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.54545455 0.75 0.85714286 0.85714286 0.85714286 0.77777778 0.71428571 0.36363636 0.5 0.55555556] mean value: 0.6778138528138528 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.38 Accuracy on Blind test: 0.69 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.83326268 0.87439442 0.91672993 0.89082122 0.88721848 0.86894846 0.90624857 0.85243893 0.90239453 0.89272261] mean value: 0.8825179815292359 key: score_time value: [0.22324824 0.22895718 0.18167663 0.22090197 0.22210288 0.22428799 0.13536072 0.24140263 0.21208525 0.23036623] mean value: 0.21203896999359131 key: test_mcc value: [0.37796447 0.60714286 0.8660254 1. 0.71428571 0.74535599 0.63245553 0.1490712 0.31622777 0.42857143] mean value: 0.5837100365844096 key: train_mcc value: [0.93745372 0.93889821 0.93933644 0.93933644 0.90802522 0.95417386 0.92288947 0.95417386 0.93933644 0.9379581 ] mean value: 0.9371581765131688 key: test_accuracy value: [0.66666667 0.8 0.92857143 1. 0.85714286 0.85714286 0.78571429 0.57142857 0.64285714 0.71428571] mean value: 0.7823809523809524 key: train_accuracy value: [0.96850394 0.96850394 0.96875 0.96875 0.953125 0.9765625 0.9609375 0.9765625 0.96875 0.96875 ] mean value: 0.9679195374015748 key: test_fscore value: [0.70588235 0.8 0.93333333 1. 0.85714286 0.875 0.72727273 0.625 0.54545455 0.71428571] mean value: 0.7783371530430354 key: train_fscore value: [0.96923077 0.96923077 0.96969697 0.96969697 0.95454545 0.97709924 0.96183206 0.97709924 0.96969697 0.96923077] mean value: 0.9687359205679816 key: test_precision value: [0.6 0.85714286 0.875 1. 0.85714286 0.77777778 1. 0.55555556 0.75 0.71428571] mean value: 0.7986904761904762 key: train_precision value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [0.95454545 0.94029851 0.94117647 0.94117647 0.92647059 0.95522388 0.94029851 0.95522388 0.94117647 0.95454545] mean value: 0.9450135685210312 key: test_recall value: [0.85714286 0.75 1. 1. 0.85714286 1. 0.57142857 0.71428571 0.42857143 0.71428571] mean value: 0.7892857142857143 key: train_recall value: [0.984375 1. 1. 1. 0.984375 1. 0.984375 1. 1. 0.984375] mean value: 0.99375 key: test_roc_auc value: [0.67857143 0.80357143 0.92857143 1. 0.85714286 0.85714286 0.78571429 0.57142857 0.64285714 0.71428571] mean value: 0.7839285714285714 key: train_roc_auc value: [0.96837798 0.96875 0.96875 0.96875 0.953125 0.9765625 0.9609375 0.9765625 0.96875 0.96875 ] mean value: 0.9679315476190476 key: test_jcc value: [0.54545455 0.66666667 0.875 1. 0.75 0.77777778 0.57142857 0.45454545 0.375 0.55555556] mean value: 0.6571428571428571 key: train_jcc value: [0.94029851 0.94029851 0.94117647 0.94117647 0.91304348 0.95522388 0.92647059 0.95522388 0.94117647 0.94029851] mean value: 0.9394386761842959 MCC on Blind test: 0.46 Accuracy on Blind test: 0.72 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02314949 0.00937533 0.0098002 0.00965309 0.0087173 0.00990558 0.00964594 0.00971007 0.00986576 0.00981593] mean value: 0.010963869094848634 key: score_time value: [0.01428938 0.00957179 0.00965691 0.00903916 0.00860858 0.00936079 0.00952983 0.00949478 0.00943947 0.00927114] mean value: 0.009826183319091797 key: test_mcc value: [ 0.18898224 0.49099025 0.4472136 -0.31622777 0.14285714 0.42857143 0. -0.1490712 -0.63245553 0.1490712 ] mean value: 0.0749931358413612 key: train_mcc value: [0.48209995 0.40158859 0.438357 0.42233925 0.42610928 0.40946151 0.43943537 0.50024432 0.53229065 0.438357 ] mean value: 0.44902829325504817 key: test_accuracy value: [0.6 0.73333333 0.71428571 0.35714286 0.57142857 0.71428571 0.5 0.42857143 0.21428571 0.57142857] mean value: 0.5404761904761904 key: train_accuracy value: [0.74015748 0.7007874 0.71875 0.7109375 0.7109375 0.703125 0.71875 0.75 0.765625 0.71875 ] mean value: 0.7237819881889764 key: test_fscore value: [0.5 0.71428571 0.75 0.18181818 0.57142857 0.71428571 0.36363636 0.5 0. 0.5 ] mean value: 0.47954545454545455 key: train_fscore value: [0.73170732 0.69354839 0.70967742 0.704 0.68907563 0.68333333 0.70491803 0.75384615 0.75806452 0.70967742] mean value: 0.7137848209227128 key: test_precision value: [0.6 0.83333333 0.66666667 0.25 0.57142857 0.71428571 0.5 0.44444444 0. 0.6 ] mean value: 0.518015873015873 key: train_precision value: [0.76271186 0.70491803 0.73333333 0.72131148 0.74545455 0.73214286 0.74137931 0.74242424 0.78333333 0.73333333] mean value: 0.7400342327969973 key: test_recall value: [0.42857143 0.625 0.85714286 0.14285714 0.57142857 0.71428571 0.28571429 0.57142857 0. 0.42857143] mean value: 0.46249999999999997 key: train_recall value: [0.703125 0.68253968 0.6875 0.6875 0.640625 0.640625 0.671875 0.765625 0.734375 0.6875 ] mean value: 0.6901289682539683 key: test_roc_auc value: [0.58928571 0.74107143 0.71428571 0.35714286 0.57142857 0.71428571 0.5 0.42857143 0.21428571 0.57142857] mean value: 0.5401785714285714 key: train_roc_auc value: [0.74045139 0.70064484 0.71875 0.7109375 0.7109375 0.703125 0.71875 0.75 0.765625 0.71875 ] mean value: 0.7237971230158731 key: test_jcc value: [0.33333333 0.55555556 0.6 0.1 0.4 0.55555556 0.22222222 0.33333333 0. 0.33333333] mean value: 0.3433333333333333 key: train_jcc value: [0.57692308 0.5308642 0.55 0.54320988 0.52564103 0.51898734 0.5443038 0.60493827 0.61038961 0.55 ] mean value: 0.5555257197873231 MCC on Blind test: 0.27 Accuracy on Blind test: 0.63 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.06581926 0.05272436 0.0533154 0.04687142 0.04571605 0.04804492 0.05116105 0.04486775 0.04888487 0.05014682] mean value: 0.0507551908493042 key: score_time value: [0.01139712 0.0113914 0.01027155 0.01038671 0.0105803 0.01110101 0.0105691 0.0107646 0.01118851 0.01136994] mean value: 0.010902023315429688 key: test_mcc value: [0.66143783 0.87287156 1. 0.8660254 0.71428571 0.71428571 0.74535599 0.1490712 0.8660254 0.42857143] mean value: 0.7017930244421767 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8 0.93333333 1. 0.92857143 0.85714286 0.85714286 0.85714286 0.57142857 0.92857143 0.71428571] mean value: 0.8447619047619047 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.82352941 0.94117647 1. 0.92307692 0.85714286 0.85714286 0.83333333 0.625 0.93333333 0.71428571] mean value: 0.850802090066796 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.7 0.88888889 1. 1. 0.85714286 0.85714286 1. 0.55555556 0.875 0.71428571] mean value: 0.8448015873015873 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 0.85714286 0.85714286 0.85714286 0.71428571 0.71428571 1. 0.71428571] mean value: 0.8714285714285714 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8125 0.92857143 1. 0.92857143 0.85714286 0.85714286 0.85714286 0.57142857 0.92857143 0.71428571] mean value: 0.8455357142857143 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.7 0.88888889 1. 0.85714286 0.75 0.75 0.71428571 0.45454545 0.875 0.55555556] mean value: 0.754541847041847 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.04 Accuracy on Blind test: 0.51 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.02718544 0.04948449 0.05275106 0.04704428 0.05495071 0.05065632 0.04379416 0.04743075 0.04433584 0.05247545] mean value: 0.04701085090637207 key: score_time value: [0.02026653 0.02336526 0.01186585 0.02184844 0.01182151 0.02222586 0.02072549 0.02242494 0.02011228 0.02445412] mean value: 0.019911026954650878 key: test_mcc value: [-0.04029115 0.09449112 0.57735027 0.42857143 0. 0.28867513 0. 0. 0.1490712 0. ] mean value: 0.1497868000906891 key: train_mcc value: [1. 1. 1. 1. 1. 0.96922337 1. 1. 1. 1. ] mean value: 0.9969223369195119 key: test_accuracy value: [0.46666667 0.53333333 0.78571429 0.71428571 0.5 0.64285714 0.5 0.5 0.57142857 0.5 ] mean value: 0.5714285714285714 key: train_accuracy value: [1. 1. 1. 1. 1. 0.984375 1. 1. 1. 1. ] mean value: 0.9984375 key: test_fscore value: [0.55555556 0.46153846 0.76923077 0.71428571 0.36363636 0.61538462 0.36363636 0.53333333 0.625 0.46153846] mean value: 0.5463139638139638 key: train_fscore value: [1. 1. 1. 1. 1. 0.98461538 1. 1. 1. 1. ] mean value: 0.9984615384615385 key: test_precision value: [0.45454545 0.6 0.83333333 0.71428571 0.5 0.66666667 0.5 0.5 0.55555556 0.5 ] mean value: 0.5824386724386724 key: train_precision value: [1. 1. 1. 1. 1. 0.96969697 1. 1. 1. 1. ] mean value: 0.996969696969697 key: test_recall value: [0.71428571 0.375 0.71428571 0.71428571 0.28571429 0.57142857 0.28571429 0.57142857 0.71428571 0.42857143] mean value: 0.5375 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.48214286 0.54464286 0.78571429 0.71428571 0.5 0.64285714 0.5 0.5 0.57142857 0.5 ] mean value: 0.5741071428571428 key: train_roc_auc value: [1. 1. 1. 1. 1. 0.984375 1. 1. 1. 1. ] mean value: 0.9984375 key: test_jcc value: [0.38461538 0.3 0.625 0.55555556 0.22222222 0.44444444 0.22222222 0.36363636 0.45454545 0.3 ] mean value: 0.3872241647241647 key: train_jcc value: [1. 1. 1. 1. 1. 0.96969697 1. 1. 1. 1. ] mean value: 0.996969696969697 MCC on Blind test: 0.21 Accuracy on Blind test: 0.6 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.0241859 0.00881934 0.00854516 0.00849938 0.00865054 0.00857329 0.00858045 0.00881457 0.00852299 0.00857997] mean value: 0.010177159309387207 key: score_time value: [0.00976634 0.00857759 0.00845194 0.00852895 0.00844574 0.00858092 0.00846004 0.00851583 0.0085721 0.00858021] mean value: 0.008647966384887695 key: test_mcc value: [ 0.21821789 0.32732684 0.17407766 0.71428571 -0.1490712 0.57735027 0.1490712 0.4472136 0.1490712 0.1490712 ] mean value: 0.2756614357520949 key: train_mcc value: [0.38660962 0.3754942 0.42824786 0.39298268 0.42442129 0.39298268 0.38177086 0.34442336 0.44095855 0.43943537] mean value: 0.40073264569464856 key: test_accuracy value: [0.6 0.66666667 0.57142857 0.85714286 0.42857143 0.78571429 0.57142857 0.71428571 0.57142857 0.57142857] mean value: 0.6338095238095238 key: train_accuracy value: [0.69291339 0.68503937 0.7109375 0.6953125 0.7109375 0.6953125 0.6875 0.671875 0.71875 0.71875 ] mean value: 0.6987327755905511 key: test_fscore value: [0.625 0.70588235 0.66666667 0.85714286 0.5 0.8 0.5 0.75 0.5 0.625 ] mean value: 0.6529691876750701 key: train_fscore value: [0.70676692 0.70588235 0.73381295 0.71111111 0.72592593 0.71111111 0.71428571 0.68181818 0.73529412 0.73134328] mean value: 0.715735166535589 key: test_precision value: [0.55555556 0.66666667 0.54545455 0.85714286 0.44444444 0.75 0.6 0.66666667 0.6 0.55555556] mean value: 0.6241486291486291 key: train_precision value: [0.68115942 0.65753425 0.68 0.67605634 0.69014085 0.67605634 0.65789474 0.66176471 0.69444444 0.7 ] mean value: 0.6775051075160861 key: test_recall value: [0.71428571 0.75 0.85714286 0.85714286 0.57142857 0.85714286 0.42857143 0.85714286 0.42857143 0.71428571] mean value: 0.7035714285714285 key: train_recall value: [0.734375 0.76190476 0.796875 0.75 0.765625 0.75 0.78125 0.703125 0.78125 0.765625 ] mean value: 0.7590029761904762 key: test_roc_auc value: [0.60714286 0.66071429 0.57142857 0.85714286 0.42857143 0.78571429 0.57142857 0.71428571 0.57142857 0.57142857] mean value: 0.6339285714285714 key: train_roc_auc value: [0.69258433 0.68563988 0.7109375 0.6953125 0.7109375 0.6953125 0.6875 0.671875 0.71875 0.71875 ] mean value: 0.6987599206349207 key: test_jcc value: [0.45454545 0.54545455 0.5 0.75 0.33333333 0.66666667 0.33333333 0.6 0.33333333 0.45454545] mean value: 0.4971212121212121 key: train_jcc value: [0.54651163 0.54545455 0.57954545 0.55172414 0.56976744 0.55172414 0.55555556 0.51724138 0.58139535 0.57647059] mean value: 0.5575390217567915 MCC on Blind test: 0.32 Accuracy on Blind test: 0.66 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01038313 0.01427412 0.01598454 0.01353955 0.01545024 0.01576066 0.01404667 0.01471567 0.01347804 0.01635647] mean value: 0.014398908615112305 key: score_time value: [0.00853682 0.01141334 0.01140761 0.01149416 0.01144385 0.01149607 0.01146412 0.01140809 0.01136732 0.01147699] mean value: 0.011150836944580078 key: test_mcc value: [0.46428571 0.56407607 0.71428571 0.71428571 0.42857143 0.74535599 0.57735027 0.57735027 0.17407766 0.28867513] mean value: 0.5248313967676029 key: train_mcc value: [0.86101708 0.72678367 0.82717019 0.80168466 0.85042006 0.84063468 0.78756153 0.90625 0.77459667 0.93933644] mean value: 0.8315454985026968 key: test_accuracy value: [0.73333333 0.73333333 0.85714286 0.85714286 0.71428571 0.85714286 0.78571429 0.78571429 0.57142857 0.64285714] mean value: 0.7538095238095238 key: train_accuracy value: [0.92913386 0.8503937 0.90625 0.8984375 0.921875 0.9140625 0.8828125 0.953125 0.875 0.96875 ] mean value: 0.9099840059055118 key: test_fscore value: [0.71428571 0.66666667 0.85714286 0.85714286 0.71428571 0.83333333 0.8 0.8 0.66666667 0.61538462] mean value: 0.7524908424908424 key: train_fscore value: [0.92682927 0.82568807 0.89655172 0.9037037 0.92647059 0.90598291 0.8951049 0.953125 0.88888889 0.96969697] mean value: 0.9092042017437767 key: test_precision value: [0.71428571 1. 0.85714286 0.85714286 0.71428571 1. 0.75 0.75 0.54545455 0.66666667] mean value: 0.7854978354978355 key: train_precision value: [0.96610169 0.97826087 1. 0.85915493 0.875 1. 0.81012658 0.953125 0.8 0.94117647] mean value: 0.9182945546924652 key: test_recall value: [0.71428571 0.5 0.85714286 0.85714286 0.71428571 0.71428571 0.85714286 0.85714286 0.85714286 0.57142857] mean value: 0.75 key: train_recall value: [0.890625 0.71428571 0.8125 0.953125 0.984375 0.828125 1. 0.953125 1. 1. ] mean value: 0.9136160714285715 key: test_roc_auc value: [0.73214286 0.75 0.85714286 0.85714286 0.71428571 0.85714286 0.78571429 0.78571429 0.57142857 0.64285714] mean value: 0.7553571428571428 key: train_roc_auc value: [0.92943948 0.84933036 0.90625 0.8984375 0.921875 0.9140625 0.8828125 0.953125 0.875 0.96875 ] mean value: 0.9099082341269842 key: test_jcc value: [0.55555556 0.5 0.75 0.75 0.55555556 0.71428571 0.66666667 0.66666667 0.5 0.44444444] mean value: 0.6103174603174604 key: train_jcc value: [0.86363636 0.703125 0.8125 0.82432432 0.8630137 0.828125 0.81012658 0.91044776 0.8 0.94117647] mean value: 0.8356475200651571 MCC on Blind test: 0.31 Accuracy on Blind test: 0.62 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01415157 0.01356316 0.01522613 0.01370096 0.01369357 0.01361775 0.01383018 0.0135622 0.01277542 0.01345587] mean value: 0.013757681846618653 key: score_time value: [0.0115037 0.01245689 0.01166725 0.01139021 0.01141787 0.01142097 0.01140141 0.01140237 0.01158595 0.02333975] mean value: 0.012758636474609375 key: test_mcc value: [0.37796447 0. 0.74535599 0.8660254 0.28867513 0.57735027 0.63245553 0.17407766 0. 0.1490712 ] mean value: 0.38109756595673944 key: train_mcc value: [0.89071137 0.35476806 0.64978629 0.83643673 0.85042006 0.87542756 0.85947992 0.45557345 0.50487816 0.90669283] mean value: 0.7184174438156392 key: test_accuracy value: [0.66666667 0.46666667 0.85714286 0.92857143 0.64285714 0.78571429 0.78571429 0.57142857 0.5 0.57142857] mean value: 0.6776190476190476 key: train_accuracy value: [0.94488189 0.61417323 0.796875 0.9140625 0.921875 0.9375 0.9296875 0.671875 0.703125 0.953125 ] mean value: 0.8387180118110236 key: test_fscore value: [0.70588235 0. 0.83333333 0.92307692 0.61538462 0.8 0.72727273 0.4 0.63157895 0.5 ] mean value: 0.6136528899377196 key: train_fscore value: [0.94656489 0.36363636 0.74509804 0.90756303 0.91666667 0.93846154 0.92913386 0.51162791 0.77108434 0.95238095] mean value: 0.7982217573661332 key: test_precision value: [0.6 0. 1. 1. 0.66666667 0.75 1. 0.66666667 0.5 0.6 ] mean value: 0.6783333333333333 key: train_precision value: [0.92537313 1. 1. 0.98181818 0.98214286 0.92424242 0.93650794 1. 0.62745098 0.96774194] mean value: 0.9345277449915785 key: test_recall value: [0.85714286 0. 0.71428571 0.85714286 0.57142857 0.85714286 0.57142857 0.28571429 0.85714286 0.42857143] mean value: 0.6 key: train_recall value: [0.96875 0.22222222 0.59375 0.84375 0.859375 0.953125 0.921875 0.34375 1. 0.9375 ] mean value: 0.7644097222222223 key: test_roc_auc value: [0.67857143 0.5 0.85714286 0.92857143 0.64285714 0.78571429 0.78571429 0.57142857 0.5 0.57142857] mean value: 0.6821428571428572 key: train_roc_auc value: [0.94469246 0.61111111 0.796875 0.9140625 0.921875 0.9375 0.9296875 0.671875 0.703125 0.953125 ] mean value: 0.8383928571428572 key: test_jcc value: [0.54545455 0. 0.71428571 0.85714286 0.44444444 0.66666667 0.57142857 0.25 0.46153846 0.33333333] mean value: 0.4844294594294594 key: train_jcc value: [0.89855072 0.22222222 0.59375 0.83076923 0.84615385 0.88405797 0.86764706 0.34375 0.62745098 0.90909091] mean value: 0.7023442943104068 MCC on Blind test: 0.26 Accuracy on Blind test: 0.62 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.11391807 0.09409237 0.09487605 0.09567189 0.09379506 0.09356427 0.09453201 0.09791088 0.09554052 0.09554005] mean value: 0.09694411754608154 key: score_time value: [0.01464009 0.0145905 0.01488519 0.01464534 0.01463294 0.01461124 0.01467419 0.0150106 0.01514769 0.01581359] mean value: 0.01486513614654541 key: test_mcc value: [0.66143783 0.87287156 1. 1. 0.74535599 0.8660254 0.52223297 0.28867513 0.8660254 0.57735027] mean value: 0.7399974560430457 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8 0.93333333 1. 1. 0.85714286 0.92857143 0.71428571 0.64285714 0.92857143 0.78571429] mean value: 0.8590476190476191 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.82352941 0.94117647 1. 1. 0.83333333 0.92307692 0.6 0.66666667 0.93333333 0.76923077] mean value: 0.8490346907993966 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.7 0.88888889 1. 1. 1. 1. 1. 0.625 0.875 0.83333333] mean value: 0.8922222222222222 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.71428571 0.85714286 0.42857143 0.71428571 1. 0.71428571] mean value: 0.8428571428571429 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8125 0.92857143 1. 1. 0.85714286 0.92857143 0.71428571 0.64285714 0.92857143 0.78571429] mean value: 0.8598214285714286 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.7 0.88888889 1. 1. 0.71428571 0.85714286 0.42857143 0.5 0.875 0.625 ] mean value: 0.7588888888888888 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: -0.0 Accuracy on Blind test: 0.5 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.03230977 0.03465533 0.05089736 0.04256916 0.04040456 0.05487514 0.04169941 0.04757524 0.03255701 0.04550004] mean value: 0.04230430126190186 key: score_time value: [0.01723957 0.02700257 0.02792835 0.0355022 0.02915096 0.02598453 0.03048086 0.01950192 0.01615834 0.01654243] mean value: 0.024549174308776855 key: test_mcc value: [0.76376262 0.87287156 1. 1. 0.57735027 0.8660254 0.74535599 0.1490712 0.8660254 0.57735027] mean value: 0.7417812713717987 key: train_mcc value: [1. 1. 0.98449518 0.95324137 0.95324137 0.96922337 0.98449518 0.96922337 0.96922337 1. ] mean value: 0.9783143216676922 key: test_accuracy value: [0.86666667 0.93333333 1. 1. 0.78571429 0.92857143 0.85714286 0.57142857 0.92857143 0.78571429] mean value: 0.8657142857142857 key: train_accuracy value: [1. 1. 0.9921875 0.9765625 0.9765625 0.984375 0.9921875 0.984375 0.984375 1. ] mean value: 0.9890625 key: test_fscore value: [0.875 0.94117647 1. 1. 0.76923077 0.93333333 0.83333333 0.625 0.93333333 0.76923077] mean value: 0.8679638009049774 key: train_fscore value: [1. 1. 0.99212598 0.97637795 0.97674419 0.98412698 0.99224806 0.98412698 0.98412698 1. ] mean value: 0.9889877137450842 key: test_precision value: [0.77777778 0.88888889 1. 1. 0.83333333 0.875 1. 0.55555556 0.875 0.83333333] mean value: 0.8638888888888889 key: train_precision value: [1. 1. 1. 0.98412698 0.96923077 1. 0.98461538 1. 1. 1. ] mean value: 0.9937973137973138 key: test_recall value: [1. 1. 1. 1. 0.71428571 1. 0.71428571 0.71428571 1. 0.71428571] mean value: 0.8857142857142857 key: train_recall value: [1. 1. 0.984375 0.96875 0.984375 0.96875 1. 0.96875 0.96875 1. ] mean value: 0.984375 key: test_roc_auc value: [0.875 0.92857143 1. 1. 0.78571429 0.92857143 0.85714286 0.57142857 0.92857143 0.78571429] mean value: 0.8660714285714286 key: train_roc_auc value: [1. 1. 0.9921875 0.9765625 0.9765625 0.984375 0.9921875 0.984375 0.984375 1. ] mean value: 0.9890625 key: test_jcc value: [0.77777778 0.88888889 1. 1. 0.625 0.875 0.71428571 0.45454545 0.875 0.625 ] mean value: 0.7835497835497836 key: train_jcc value: [1. 1. 0.984375 0.95384615 0.95454545 0.96875 0.98461538 0.96875 0.96875 1. ] mean value: 0.9783631993006994 MCC on Blind test: 0.06 Accuracy on Blind test: 0.52 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.03254342 0.05661488 0.04824209 0.05007434 0.06980419 0.04639006 0.03822303 0.04851413 0.0455544 0.03839159] mean value: 0.047435212135314944 key: score_time value: [0.02400923 0.0254271 0.02423596 0.02527332 0.02484155 0.02546525 0.02542686 0.02713227 0.02601194 0.02398038] mean value: 0.02518038749694824 key: test_mcc value: [ 0.32732684 -0.19642857 0.4472136 0.63245553 0.28867513 0.1490712 0.28867513 0.14285714 -0.4472136 0. ] mean value: 0.16326324065058476 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.66666667 0.4 0.71428571 0.78571429 0.64285714 0.57142857 0.64285714 0.57142857 0.28571429 0.5 ] mean value: 0.5780952380952381 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.61538462 0.4 0.66666667 0.72727273 0.66666667 0.5 0.61538462 0.57142857 0.16666667 0.53333333] mean value: 0.5462803862803862 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.66666667 0.42857143 0.8 1. 0.625 0.6 0.66666667 0.57142857 0.2 0.5 ] mean value: 0.6058333333333333 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.57142857 0.375 0.57142857 0.57142857 0.71428571 0.42857143 0.57142857 0.57142857 0.14285714 0.57142857] mean value: 0.5089285714285714 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.66071429 0.40178571 0.71428571 0.78571429 0.64285714 0.57142857 0.64285714 0.57142857 0.28571429 0.5 ] mean value: 0.5776785714285714 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.44444444 0.25 0.5 0.57142857 0.5 0.33333333 0.44444444 0.4 0.09090909 0.36363636] mean value: 0.3898196248196248 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.21 Accuracy on Blind test: 0.6 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.27084661 0.25251555 0.25915909 0.25225282 0.25605011 0.25171423 0.25446415 0.25518131 0.25559974 0.24836516] mean value: 0.25561487674713135 key: score_time value: [0.00921893 0.00908136 0.00908971 0.0089159 0.00926518 0.00900149 0.00906825 0.00937915 0.00925112 0.00913453] mean value: 0.009140563011169434 key: test_mcc value: [0.66143783 0.87287156 1. 1. 0.71428571 0.8660254 0.74535599 0.31622777 0.8660254 0.42857143] mean value: 0.7470801097652905 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8 0.93333333 1. 1. 0.85714286 0.92857143 0.85714286 0.64285714 0.92857143 0.71428571] mean value: 0.8661904761904762 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.82352941 0.94117647 1. 1. 0.85714286 0.93333333 0.83333333 0.70588235 0.93333333 0.71428571] mean value: 0.8742016806722689 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.7 0.88888889 1. 1. 0.85714286 0.875 1. 0.6 0.875 0.71428571] mean value: 0.851031746031746 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.85714286 1. 0.71428571 0.85714286 1. 0.71428571] mean value: 0.9142857142857143 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8125 0.92857143 1. 1. 0.85714286 0.92857143 0.85714286 0.64285714 0.92857143 0.71428571] mean value: 0.8669642857142857 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.7 0.88888889 1. 1. 0.75 0.875 0.71428571 0.54545455 0.875 0.55555556] mean value: 0.7904184704184705 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.07 Accuracy on Blind test: 0.52 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.01737309 0.016608 0.01730108 0.01654887 0.04343247 0.01705122 0.02059937 0.01725125 0.01745582 0.01699615] mean value: 0.020061731338500977 key: score_time value: [0.01210737 0.0118475 0.01199269 0.01186085 0.01219201 0.01192856 0.01498175 0.01467967 0.01503849 0.01461554] mean value: 0.013124442100524903 key: test_mcc value: [ 0.05455447 0.20044593 0.14285714 -0.40824829 -0.14285714 -0.14285714 -0.28867513 -0.14285714 -0.1490712 0. ] mean value: -0.08767085052796311 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.53333333 0.6 0.57142857 0.35714286 0.42857143 0.42857143 0.35714286 0.42857143 0.42857143 0.5 ] mean value: 0.4633333333333333 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.46153846 0.7 0.57142857 0.52631579 0.42857143 0.42857143 0.30769231 0.42857143 0.5 0.46153846] mean value: 0.4814227877385772 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.5 0.58333333 0.57142857 0.41666667 0.42857143 0.42857143 0.33333333 0.42857143 0.44444444 0.5 ] mean value: 0.4634920634920635 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.42857143 0.875 0.57142857 0.71428571 0.42857143 0.42857143 0.28571429 0.42857143 0.57142857 0.42857143] mean value: 0.5160714285714285 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.52678571 0.58035714 0.57142857 0.35714286 0.42857143 0.42857143 0.35714286 0.42857143 0.42857143 0.5 ] mean value: 0.4607142857142857 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.3 0.53846154 0.4 0.35714286 0.27272727 0.27272727 0.18181818 0.27272727 0.33333333 0.3 ] mean value: 0.32289377289377286 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.03 Accuracy on Blind test: 0.51 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.03756571 0.01294708 0.01308179 0.01303172 0.01298833 0.01315022 0.01306343 0.012995 0.01294088 0.02156067] mean value: 0.016332483291625975 key: score_time value: [0.0116086 0.01149893 0.01147532 0.01145434 0.01147461 0.01149917 0.01148558 0.0114975 0.0114572 0.01152682] mean value: 0.011497807502746583 key: test_mcc value: [0.21821789 0.26189246 0.71428571 0.74535599 0.74535599 0.8660254 0.8660254 0.4472136 0.28867513 0.4472136 ] mean value: 0.5600261185993421 key: train_mcc value: [0.93748452 0.93748452 0.92288947 0.89073374 0.89073374 0.90625 0.85947992 0.95324137 0.95417386 0.9379581 ] mean value: 0.9190429255191599 key: test_accuracy value: [0.6 0.6 0.85714286 0.85714286 0.85714286 0.92857143 0.92857143 0.71428571 0.64285714 0.71428571] mean value: 0.77 key: train_accuracy value: [0.96850394 0.96850394 0.9609375 0.9453125 0.9453125 0.953125 0.9296875 0.9765625 0.9765625 0.96875 ] mean value: 0.9593257874015748 key: test_fscore value: [0.625 0.5 0.85714286 0.83333333 0.83333333 0.92307692 0.92307692 0.75 0.66666667 0.66666667] mean value: 0.7578296703296703 key: train_fscore value: [0.96825397 0.96875 0.96183206 0.94573643 0.94573643 0.953125 0.92913386 0.97674419 0.97709924 0.96923077] mean value: 0.9595641947725944 key: test_precision value: [0.55555556 0.75 0.85714286 1. 1. 1. 1. 0.66666667 0.625 0.8 ] mean value: 0.8254365079365079 key: train_precision value: [0.98387097 0.95384615 0.94029851 0.93846154 0.93846154 0.953125 0.93650794 0.96923077 0.95522388 0.95454545] mean value: 0.9523571746855028 key: test_recall value: [0.71428571 0.375 0.85714286 0.71428571 0.71428571 0.85714286 0.85714286 0.85714286 0.71428571 0.57142857] mean value: 0.7232142857142857 key: train_recall value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:175: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:178: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.953125 0.98412698 0.984375 0.953125 0.953125 0.953125 0.921875 0.984375 1. 0.984375 ] mean value: 0.9671626984126984 key: test_roc_auc value: [0.60714286 0.61607143 0.85714286 0.85714286 0.85714286 0.92857143 0.92857143 0.71428571 0.64285714 0.71428571] mean value: 0.7723214285714286 key: train_roc_auc value: [0.96862599 0.96862599 0.9609375 0.9453125 0.9453125 0.953125 0.9296875 0.9765625 0.9765625 0.96875 ] mean value: 0.9593501984126984 key: test_jcc value: [0.45454545 0.33333333 0.75 0.71428571 0.71428571 0.85714286 0.85714286 0.6 0.5 0.5 ] mean value: 0.628073593073593 key: train_jcc value: [0.93846154 0.93939394 0.92647059 0.89705882 0.89705882 0.91044776 0.86764706 0.95454545 0.95522388 0.94029851] mean value: 0.922660637577231 MCC on Blind test: 0.24 Accuracy on Blind test: 0.62 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.17722702 0.11350346 0.21813035 0.2076714 0.21254301 0.24001646 0.32121396 0.21115351 0.19236588 0.18778825] mean value: 0.2081613302230835 key: score_time value: [0.0201118 0.01175857 0.01412797 0.02275658 0.02106333 0.02480865 0.02300787 0.02113318 0.0210979 0.0177002 ] mean value: 0.019756603240966796 key: test_mcc value: [0.21821789 0.26189246 0.71428571 0.74535599 0.74535599 0.8660254 0.8660254 0.4472136 0.28867513 0.4472136 ] mean value: 0.5600261185993421 key: train_mcc value: [0.93748452 0.93748452 0.92288947 0.89073374 0.89073374 0.90625 0.85947992 0.95324137 0.95417386 0.9379581 ] mean value: 0.9190429255191599 key: test_accuracy value: [0.6 0.6 0.85714286 0.85714286 0.85714286 0.92857143 0.92857143 0.71428571 0.64285714 0.71428571] mean value: 0.77 key: train_accuracy value: [0.96850394 0.96850394 0.9609375 0.9453125 0.9453125 0.953125 0.9296875 0.9765625 0.9765625 0.96875 ] mean value: 0.9593257874015748 key: test_fscore value: [0.625 0.5 0.85714286 0.83333333 0.83333333 0.92307692 0.92307692 0.75 0.66666667 0.66666667] mean value: 0.7578296703296703 key: train_fscore value: [0.96825397 0.96875 0.96183206 0.94573643 0.94573643 0.953125 0.92913386 0.97674419 0.97709924 0.96923077] mean value: 0.9595641947725944 key: test_precision value: [0.55555556 0.75 0.85714286 1. 1. 1. 1. 0.66666667 0.625 0.8 ] mean value: 0.8254365079365079 key: train_precision value: [0.98387097 0.95384615 0.94029851 0.93846154 0.93846154 0.953125 0.93650794 0.96923077 0.95522388 0.95454545] mean value: 0.9523571746855028 key: test_recall value: [0.71428571 0.375 0.85714286 0.71428571 0.71428571 0.85714286 0.85714286 0.85714286 0.71428571 0.57142857] mean value: 0.7232142857142857 key: train_recall value: [0.953125 0.98412698 0.984375 0.953125 0.953125 0.953125 0.921875 0.984375 1. 0.984375 ] mean value: 0.9671626984126984 key: test_roc_auc value: [0.60714286 0.61607143 0.85714286 0.85714286 0.85714286 0.92857143 0.92857143 0.71428571 0.64285714 0.71428571] mean value: 0.7723214285714286 key: train_roc_auc value: [0.96862599 0.96862599 0.9609375 0.9453125 0.9453125 0.953125 0.9296875 0.9765625 0.9765625 0.96875 ] mean value: 0.9593501984126984 key: test_jcc value: [0.45454545 0.33333333 0.75 0.71428571 0.71428571 0.85714286 0.85714286 0.6 0.5 0.5 ] mean value: 0.628073593073593 key: train_jcc value: [0.93846154 0.93939394 0.92647059 0.89705882 0.89705882 0.91044776 0.86764706 0.95454545 0.95522388 0.94029851] mean value: 0.922660637577231 MCC on Blind test: 0.24 Accuracy on Blind test: 0.62 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03295398 0.03293633 0.04036927 0.07436967 0.05748558 0.04356003 0.07211637 0.03021264 0.03299451 0.03802323] mean value: 0.04550216197967529 key: score_time value: [0.01606202 0.01162028 0.0115943 0.01190186 0.01204824 0.02361083 0.02349329 0.01182771 0.01200891 0.01200128] mean value: 0.014616870880126953 key: test_mcc value: [0.39393939 0.66414149 0.65909298 0.48075018 0.74242424 0.74047959 0.74047959 0.74047959 0.56694671 0.48795004] mean value: 0.6216683798300241 key: train_mcc value: [0.80500813 0.85463818 0.89371934 0.86356283 0.84407425 0.86358877 0.86493273 0.88292404 0.85473156 0.87481777] mean value: 0.8601997596512527 key: test_accuracy value: [0.69565217 0.82608696 0.82608696 0.73913043 0.86956522 0.86956522 0.86956522 0.86956522 0.77272727 0.72727273] mean value: 0.8065217391304348 key: train_accuracy value: [0.90243902 0.92682927 0.94634146 0.93170732 0.92195122 0.93170732 0.93170732 0.94146341 0.92718447 0.9368932 ] mean value: 0.9298224011366327 key: test_fscore value: [0.69565217 0.83333333 0.8 0.7 0.86956522 0.88 0.88 0.88 0.8 0.66666667] mean value: 0.8005217391304347 key: train_fscore value: [0.90384615 0.92890995 0.9478673 0.93269231 0.9223301 0.93203883 0.93333333 0.94117647 0.92822967 0.93838863] mean value: 0.9308812739347887 key: test_precision value: [0.66666667 0.76923077 0.88888889 0.77777778 0.90909091 0.84615385 0.84615385 0.84615385 0.71428571 0.85714286] mean value: 0.8121545121545122 key: train_precision value: [0.8952381 0.90740741 0.92592593 0.92380952 0.91346154 0.92307692 0.90740741 0.94117647 0.91509434 0.91666667] mean value: 0.9169264298204365 key: test_recall value: [0.72727273 0.90909091 0.72727273 0.63636364 0.83333333 0.91666667 0.91666667 0.91666667 0.90909091 0.54545455] mean value: 0.8037878787878787 key: train_recall value: [0.91262136 0.95145631 0.97087379 0.94174757 0.93137255 0.94117647 0.96078431 0.94117647 0.94174757 0.96116505] mean value: 0.9454121454407005 key: test_roc_auc value: [0.6969697 0.82954545 0.8219697 0.73484848 0.87121212 0.86742424 0.86742424 0.86742424 0.77272727 0.72727273] mean value: 0.8056818181818182 key: train_roc_auc value: [0.90238911 0.92670855 0.94622121 0.9316581 0.92199695 0.93175328 0.93184847 0.94146202 0.92718447 0.9368932 ] mean value: 0.9298115362649915 key: test_jcc value: [0.53333333 0.71428571 0.66666667 0.53846154 0.76923077 0.78571429 0.78571429 0.78571429 0.66666667 0.5 ] mean value: 0.6745787545787546 key: train_jcc value: [0.8245614 0.86725664 0.9009009 0.87387387 0.85585586 0.87272727 0.875 0.88888889 0.86607143 0.88392857] mean value: 0.8709064832923705 MCC on Blind test: 0.34 Accuracy on Blind test: 0.67 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.93171263 0.76386046 0.90062022 0.77434134 0.75542283 0.82477236 0.75631881 0.79757857 0.92382717 0.79171753] mean value: 0.8220171928405762 key: score_time value: [0.01185131 0.0120995 0.02216887 0.01233625 0.01512313 0.01532745 0.01551938 0.01230907 0.01238847 0.01561093] mean value: 0.014473438262939453 key: test_mcc value: [0.82575758 0.74047959 0.76277007 0.56818182 0.76764947 0.82575758 0.74242424 0.91666667 0.54772256 0.75592895] mean value: 0.7453338517356568 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 0.91369855 1. ] mean value: 0.9913698554847693 key: test_accuracy value: [0.91304348 0.86956522 0.86956522 0.7826087 0.86956522 0.91304348 0.86956522 0.95652174 0.77272727 0.86363636] mean value: 0.8679841897233201 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 0.95631068 1. ] mean value: 0.9956310679611651 key: test_fscore value: [0.90909091 0.85714286 0.84210526 0.7826087 0.85714286 0.91666667 0.86956522 0.95652174 0.7826087 0.84210526] mean value: 0.8615558164185166 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 0.95734597 1. ] mean value: 0.9957345971563981 key: test_precision value: [0.90909091 0.9 1. 0.75 1. 0.91666667 0.90909091 1. 0.75 1. ] mean value: 0.9134848484848485 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 0.93518519 1. ] mean value: 0.9935185185185185 key: test_recall value: [0.90909091 0.81818182 0.72727273 0.81818182 0.75 0.91666667 0.83333333 0.91666667 0.81818182 0.72727273] mean value: 0.8234848484848485 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 0.98058252 1. ] mean value: 0.9980582524271845 key: test_roc_auc value: [0.91287879 0.86742424 0.86363636 0.78409091 0.875 0.91287879 0.87121212 0.95833333 0.77272727 0.86363636] mean value: 0.8681818181818182 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 0.95631068 1. ] mean value: 0.9956310679611651 key: test_jcc value: [0.83333333 0.75 0.72727273 0.64285714 0.75 0.84615385 0.76923077 0.91666667 0.64285714 0.72727273] mean value: 0.7605644355644355 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 0.91818182 1. ] mean value: 0.9918181818181818 MCC on Blind test: 0.19 Accuracy on Blind test: 0.59 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.0132544 0.01033616 0.00911188 0.009161 0.00958657 0.00882697 0.00929618 0.00892305 0.0093689 0.00876236] mean value: 0.009662747383117676 key: score_time value: [0.01657176 0.00902963 0.00913548 0.00981712 0.00965953 0.00855589 0.0086391 0.00849652 0.00863767 0.008816 ] mean value: 0.009735870361328124 key: test_mcc value: [0.11236664 0.56490196 0.65151515 0.06579517 0.22407133 0.50168817 0.58002308 0.42228828 0.48795004 0.09759001] mean value: 0.37081898188601464 key: train_mcc value: [0.41031528 0.49366174 0.51698955 0.40881923 0.40551208 0.45203295 0.49026396 0.44322953 0.45669396 0.43639645] mean value: 0.45139147236435284 key: test_accuracy value: [0.52173913 0.7826087 0.82608696 0.52173913 0.60869565 0.73913043 0.7826087 0.69565217 0.72727273 0.54545455] mean value: 0.675098814229249 key: train_accuracy value: [0.68292683 0.74634146 0.74146341 0.67317073 0.68780488 0.71707317 0.73170732 0.71219512 0.7184466 0.70873786] mean value: 0.7119867392848686 key: test_fscore value: [0.64516129 0.76190476 0.81818182 0.59259259 0.68965517 0.78571429 0.81481481 0.75862069 0.76923077 0.61538462] mean value: 0.7251260810215204 key: train_fscore value: [0.743083 0.74 0.781893 0.74329502 0.73553719 0.75 0.76793249 0.74678112 0.75423729 0.74576271] mean value: 0.7508521822638834 key: test_precision value: [0.5 0.8 0.81818182 0.5 0.58823529 0.6875 0.73333333 0.64705882 0.66666667 0.53333333] mean value: 0.6474309269162211 key: train_precision value: [0.62666667 0.7628866 0.67857143 0.61392405 0.63571429 0.66923077 0.67407407 0.66412214 0.66917293 0.66165414] mean value: 0.6656017077902033 key: test_recall value: [0.90909091 0.72727273 0.81818182 0.72727273 0.83333333 0.91666667 0.91666667 0.91666667 0.90909091 0.72727273] mean value: 0.8401515151515151 key: train_recall value: [0.91262136 0.7184466 0.9223301 0.94174757 0.87254902 0.85294118 0.89215686 0.85294118 0.86407767 0.85436893] mean value: 0.8684180468303826 key: test_roc_auc value: [0.53787879 0.78030303 0.82575758 0.53030303 0.59848485 0.73106061 0.77651515 0.68560606 0.72727273 0.54545455] mean value: 0.6738636363636363 key: train_roc_auc value: [0.68180088 0.7464782 0.74057681 0.67185418 0.68870169 0.71773272 0.7324862 0.71287836 0.7184466 0.70873786] mean value: 0.711969350847135 key: test_jcc value: [0.47619048 0.61538462 0.69230769 0.42105263 0.52631579 0.64705882 0.6875 0.61111111 0.625 0.44444444] mean value: 0.5746365584020383 key: train_jcc value: [0.59119497 0.58730159 0.64189189 0.59146341 0.58169935 0.6 0.62328767 0.59589041 0.60544218 0.59459459] mean value: 0.6012766062443438 MCC on Blind test: 0.45 Accuracy on Blind test: 0.71 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01010966 0.00937247 0.0098629 0.00893044 0.00894618 0.0098803 0.00925088 0.00918436 0.00915742 0.00916314] mean value: 0.009385776519775391 key: score_time value: [0.00906849 0.00885463 0.00871778 0.00924611 0.00887156 0.00888371 0.00867105 0.00927925 0.00878453 0.00857878] mean value: 0.008895587921142579 key: test_mcc value: [0.21969697 0.55048188 0.22407133 0.21452908 0.3030303 0.3030303 0.33371191 0.39393939 0.09090909 0.32539569] mean value: 0.29587959510446155 key: train_mcc value: [0.44146616 0.44911432 0.45709726 0.49637007 0.4861007 0.48652841 0.43786483 0.44832571 0.49218702 0.50892419] mean value: 0.4703978666309494 key: test_accuracy value: [0.60869565 0.73913043 0.60869565 0.60869565 0.65217391 0.65217391 0.65217391 0.69565217 0.54545455 0.63636364] mean value: 0.6399209486166008 key: train_accuracy value: [0.71707317 0.72195122 0.72682927 0.74634146 0.73658537 0.74146341 0.71707317 0.72195122 0.74271845 0.75242718] mean value: 0.7324413923750888 key: test_fscore value: [0.60869565 0.625 0.47058824 0.52631579 0.66666667 0.66666667 0.6 0.69565217 0.54545455 0.5 ] mean value: 0.5905039729642637 key: train_fscore value: [0.69148936 0.70157068 0.71134021 0.73195876 0.7 0.72251309 0.69473684 0.6984127 0.71957672 0.7357513 ] mean value: 0.7107349655839269 key: test_precision value: [0.58333333 1. 0.66666667 0.625 0.66666667 0.66666667 0.75 0.72727273 0.54545455 0.8 ] mean value: 0.7031060606060606 key: train_precision value: [0.76470588 0.76136364 0.75824176 0.78021978 0.80769231 0.7752809 0.75 0.75862069 0.79069767 0.78888889] mean value: 0.7735711516709494 key: test_recall value: [0.63636364 0.45454545 0.36363636 0.45454545 0.66666667 0.66666667 0.5 0.66666667 0.54545455 0.36363636] mean value: 0.5318181818181817 key: train_recall value: [0.63106796 0.65048544 0.66990291 0.68932039 0.61764706 0.67647059 0.64705882 0.64705882 0.66019417 0.68932039] mean value: 0.657852655625357 key: test_roc_auc value: [0.60984848 0.72727273 0.59848485 0.60227273 0.65151515 0.65151515 0.65909091 0.6969697 0.54545455 0.63636364] mean value: 0.6378787878787878 key: train_roc_auc value: [0.71749476 0.72230154 0.72710832 0.74662098 0.736008 0.74114792 0.7167333 0.72158766 0.74271845 0.75242718] mean value: 0.732414810584428 key: test_jcc value: [0.4375 0.45454545 0.30769231 0.35714286 0.5 0.5 0.42857143 0.53333333 0.375 0.33333333] mean value: 0.42271187146187145 key: train_jcc value: [0.52845528 0.54032258 0.552 0.57723577 0.53846154 0.56557377 0.53225806 0.53658537 0.56198347 0.58196721] mean value: 0.5514843061067994 MCC on Blind test: 0.26 Accuracy on Blind test: 0.63 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.0090549 0.00948048 0.00953913 0.00963783 0.00954986 0.00989366 0.00965118 0.00976753 0.00973725 0.00958776] mean value: 0.009589958190917968 key: score_time value: [0.01488662 0.01073503 0.01059341 0.01062608 0.01254487 0.01061916 0.01115012 0.01079106 0.01064491 0.01087499] mean value: 0.01134662628173828 key: test_mcc value: [ 0.04545455 0.03178209 -0.06579517 0.30240737 0.15096491 -0.31298622 0.13740858 0.31252706 0.18898224 -0.09245003] mean value: 0.06982953647576104 key: train_mcc value: [0.58048549 0.45409531 0.47798272 0.45409531 0.51440766 0.52244835 0.48780456 0.43416169 0.52548679 0.56526885] mean value: 0.5016236722758309 key: test_accuracy value: [0.52173913 0.52173913 0.47826087 0.65217391 0.56521739 0.34782609 0.56521739 0.65217391 0.59090909 0.45454545] mean value: 0.5349802371541502 key: train_accuracy value: [0.7902439 0.72682927 0.73658537 0.72682927 0.75609756 0.76097561 0.74146341 0.71707317 0.76213592 0.7815534 ] mean value: 0.7499786881363959 key: test_fscore value: [0.52173913 0.42105263 0.33333333 0.6 0.5 0.4 0.54545455 0.71428571 0.52631579 0.4 ] mean value: 0.4962181144561007 key: train_fscore value: [0.79227053 0.72277228 0.71875 0.72277228 0.74226804 0.75376884 0.71957672 0.71287129 0.75376884 0.7715736 ] mean value: 0.7410392426302083 key: test_precision value: [0.5 0.5 0.42857143 0.66666667 0.625 0.38461538 0.6 0.625 0.625 0.44444444] mean value: 0.5399297924297924 key: train_precision value: [0.78846154 0.73737374 0.7752809 0.73737374 0.7826087 0.77319588 0.7816092 0.72 0.78125 0.80851064] mean value: 0.7685664317726423 key: test_recall value: [0.54545455 0.36363636 0.27272727 0.54545455 0.41666667 0.41666667 0.5 0.83333333 0.45454545 0.36363636] mean value: 0.4712121212121212 key: train_recall value: [0.7961165 0.70873786 0.66990291 0.70873786 0.70588235 0.73529412 0.66666667 0.70588235 0.72815534 0.73786408] mean value: 0.7163240053302875 key: test_roc_auc value: [0.52272727 0.51515152 0.46969697 0.64772727 0.5719697 0.34469697 0.56818182 0.64393939 0.59090909 0.45454545] mean value: 0.5329545454545455 key: train_roc_auc value: [0.79021512 0.72691795 0.73691224 0.72691795 0.7558538 0.76085094 0.74110032 0.71701885 0.76213592 0.7815534 ] mean value: 0.7499476489624977 key: test_jcc value: [0.35294118 0.26666667 0.2 0.42857143 0.33333333 0.25 0.375 0.55555556 0.35714286 0.25 ] mean value: 0.33692110177404294 key: train_jcc value: [0.656 0.56589147 0.56097561 0.56589147 0.59016393 0.60483871 0.56198347 0.55384615 0.60483871 0.62809917] mean value: 0.5892528707747853 MCC on Blind test: 0.13 Accuracy on Blind test: 0.56 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01453853 0.01178885 0.01168919 0.01257873 0.01363349 0.01344967 0.01352501 0.01355839 0.0126698 0.01276493] mean value: 0.013019657135009766 key: score_time value: [0.01078439 0.00963712 0.00947428 0.01007676 0.01063275 0.01030612 0.01044083 0.01055479 0.01006365 0.00953841] mean value: 0.010150909423828125 key: test_mcc value: [0.31298622 0.74242424 0.50168817 0.12878788 0.66414149 0.30240737 0.38932432 0.65151515 0.27272727 0.27272727] mean value: 0.42387293810042725 key: train_mcc value: [0.72894414 0.76647632 0.80552394 0.73821604 0.76638754 0.78600013 0.77647587 0.7954287 0.71848046 0.738735 ] mean value: 0.7620668135550838 key: test_accuracy value: [0.65217391 0.86956522 0.73913043 0.56521739 0.82608696 0.65217391 0.69565217 0.82608696 0.63636364 0.63636364] mean value: 0.7098814229249012 key: train_accuracy value: [0.86341463 0.88292683 0.90243902 0.86829268 0.88292683 0.89268293 0.88780488 0.89756098 0.8592233 0.86893204] mean value: 0.880620412029363 key: test_fscore value: [0.66666667 0.86956522 0.66666667 0.54545455 0.81818182 0.69230769 0.72 0.83333333 0.63636364 0.63636364] mean value: 0.70849032127293 key: train_fscore value: [0.86915888 0.88118812 0.9009901 0.87323944 0.88 0.89423077 0.88442211 0.89552239 0.85853659 0.87203791] mean value: 0.8809326300847204 key: test_precision value: [0.61538462 0.83333333 0.85714286 0.54545455 0.9 0.64285714 0.69230769 0.83333333 0.63636364 0.63636364] mean value: 0.7192540792540792 key: train_precision value: [0.83783784 0.8989899 0.91919192 0.84545455 0.89795918 0.87735849 0.90721649 0.90909091 0.8627451 0.85185185] mean value: 0.8807696229541047 key: test_recall value: [0.72727273 0.90909091 0.54545455 0.54545455 0.75 0.75 0.75 0.83333333 0.63636364 0.63636364] mean value: 0.7083333333333334 key: train_recall value: [0.90291262 0.86407767 0.88349515 0.90291262 0.8627451 0.91176471 0.8627451 0.88235294 0.85436893 0.89320388] mean value: 0.8820578716923663 key: test_roc_auc value: [0.65530303 0.87121212 0.73106061 0.56439394 0.82954545 0.64772727 0.69318182 0.82575758 0.63636364 0.63636364] mean value: 0.709090909090909 key: train_roc_auc value: [0.86322102 0.88301923 0.90253189 0.86812298 0.88282886 0.89277556 0.88768323 0.89748715 0.8592233 0.86893204] mean value: 0.8805825242718447 key: test_jcc value: [0.5 0.76923077 0.5 0.375 0.69230769 0.52941176 0.5625 0.71428571 0.46666667 0.46666667] mean value: 0.5576069273863391 key: train_jcc value: [0.76859504 0.78761062 0.81981982 0.775 0.78571429 0.80869565 0.79279279 0.81081081 0.75213675 0.77310924] mean value: 0.7874285017937194 MCC on Blind test: 0.46 Accuracy on Blind test: 0.73 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.32719898 0.17454052 0.83057833 0.54179835 0.57541323 0.82422447 0.67535782 0.35218549 0.46702242 1.07017779] mean value: 0.5838497400283813 key: score_time value: [0.01227832 0.01222968 0.01220989 0.01225519 0.01225281 0.01271915 0.01260519 0.0125823 0.01266718 0.0126586 ] mean value: 0.012445831298828125 key: test_mcc value: [0.44411739 0.12878788 0.58002308 0.30240737 0.56879646 0.56490196 0.65909298 0.50168817 0.13245324 0.46225016] mean value: 0.43445186796951096 key: train_mcc value: [0.52539178 0.50494514 0.92351163 0.73838965 0.65067908 0.85702512 0.79260855 0.58203168 0.58157543 0.93243443] mean value: 0.7088592486777825 key: test_accuracy value: [0.69565217 0.56521739 0.7826087 0.65217391 0.73913043 0.7826087 0.82608696 0.73913043 0.54545455 0.72727273] mean value: 0.7055335968379447 key: train_accuracy value: [0.74634146 0.74634146 0.96097561 0.86829268 0.80487805 0.92682927 0.88780488 0.7804878 0.75728155 0.96601942] mean value: 0.8445252190385981 key: test_fscore value: [0.74074074 0.54545455 0.73684211 0.6 0.66666667 0.8 0.84615385 0.78571429 0.28571429 0.7 ] mean value: 0.6707286475707528 key: train_fscore value: [0.78512397 0.7173913 0.96226415 0.86432161 0.76190476 0.92957746 0.89777778 0.80519481 0.6835443 0.96650718] mean value: 0.8373607320770611 key: test_precision value: [0.625 0.54545455 0.875 0.66666667 1. 0.76923077 0.78571429 0.6875 0.66666667 0.77777778] mean value: 0.7399010711510712 key: train_precision value: [0.68345324 0.81481481 0.93577982 0.89583333 0.96969697 0.89189189 0.82113821 0.72093023 0.98181818 0.95283019] mean value: 0.8668186878098524 key: test_recall value: [0.90909091 0.54545455 0.63636364 0.54545455 0.5 0.83333333 0.91666667 0.91666667 0.18181818 0.63636364] mean value: 0.6621212121212121 key: train_recall value: [0.9223301 0.6407767 0.99029126 0.83495146 0.62745098 0.97058824 0.99019608 0.91176471 0.52427184 0.98058252] mean value: 0.8393203883495146 key: test_roc_auc value: [0.70454545 0.56439394 0.77651515 0.64772727 0.75 0.78030303 0.8219697 0.73106061 0.54545455 0.72727273] mean value: 0.7049242424242423 key: train_roc_auc value: [0.74547877 0.74685894 0.96083191 0.86845612 0.80401675 0.92704169 0.88830192 0.78112507 0.75728155 0.96601942] mean value: 0.84454121454407 key: test_jcc value: [0.58823529 0.375 0.58333333 0.42857143 0.5 0.66666667 0.73333333 0.64705882 0.16666667 0.53846154] mean value: 0.5227327084680026 key: train_jcc value: [0.6462585 0.55932203 0.92727273 0.76106195 0.61538462 0.86842105 0.81451613 0.67391304 0.51923077 0.93518519] mean value: 0.7320566006417716 MCC on Blind test: 0.31 Accuracy on Blind test: 0.65 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01866984 0.01328373 0.01333165 0.01289773 0.01274443 0.01260185 0.01383185 0.01282859 0.01323938 0.01250935] mean value: 0.013593840599060058 key: score_time value: [0.01174784 0.00923467 0.00871539 0.00866914 0.0085988 0.00864315 0.00875974 0.00851679 0.00875902 0.00885749] mean value: 0.009050202369689942 key: test_mcc value: [0.82575758 0.91605722 0.69084928 0.76764947 0.76764947 0.91666667 0.74242424 1. 0.91287093 0.75592895] mean value: 0.8295853811736139 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91304348 0.95652174 0.82608696 0.86956522 0.86956522 0.95652174 0.86956522 1. 0.95454545 0.86363636] mean value: 0.9079051383399209 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90909091 0.95238095 0.77777778 0.88 0.85714286 0.95652174 0.86956522 1. 0.95238095 0.84210526] mean value: 0.8996965668453083 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.90909091 1. 1. 0.78571429 1. 1. 0.90909091 1. 1. 1. ] mean value: 0.9603896103896103 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 0.90909091 0.63636364 1. 0.75 0.91666667 0.83333333 1. 0.90909091 0.72727273] mean value: 0.8590909090909091 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.91287879 0.95454545 0.81818182 0.875 0.875 0.95833333 0.87121212 1. 0.95454545 0.86363636] mean value: 0.9083333333333333 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83333333 0.90909091 0.63636364 0.78571429 0.75 0.91666667 0.76923077 1. 0.90909091 0.72727273] mean value: 0.8236763236763237 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.1 Accuracy on Blind test: 0.54 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.10137177 0.09971786 0.09696341 0.09599876 0.10285091 0.09974742 0.10015845 0.10240602 0.10212231 0.09918237] mean value: 0.10005192756652832 key: score_time value: [0.01733947 0.0176208 0.01726961 0.01758814 0.01826119 0.01933861 0.01860666 0.01895905 0.0190897 0.01898289] mean value: 0.0183056116104126 key: test_mcc value: [0.74242424 0.91666667 0.65909298 0.39393939 0.74047959 0.56490196 0.76277007 1. 0.73029674 0.54772256] mean value: 0.7058294203018629 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 0.95652174 0.82608696 0.69565217 0.86956522 0.7826087 0.86956522 1. 0.86363636 0.77272727] mean value: 0.850592885375494 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.86956522 0.95652174 0.8 0.69565217 0.88 0.8 0.88888889 1. 0.86956522 0.76190476] mean value: 0.8522097998619738 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.83333333 0.91666667 0.88888889 0.66666667 0.84615385 0.76923077 0.8 1. 0.83333333 0.8 ] mean value: 0.8354273504273504 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90909091 1. 0.72727273 0.72727273 0.91666667 0.83333333 1. 1. 0.90909091 0.72727273] mean value: 0.875 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.87121212 0.95833333 0.8219697 0.6969697 0.86742424 0.78030303 0.86363636 1. 0.86363636 0.77272727] mean value: 0.8496212121212121 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.76923077 0.91666667 0.66666667 0.53333333 0.78571429 0.66666667 0.8 1. 0.76923077 0.61538462] mean value: 0.7522893772893773 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.32 Accuracy on Blind test: 0.64 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01011992 0.01058984 0.01019692 0.00972962 0.01009941 0.01027632 0.01017356 0.01005864 0.0098815 0.0098474 ] mean value: 0.010097312927246093 key: score_time value: [0.00989771 0.00945807 0.00941896 0.00952983 0.00942016 0.00951624 0.00943208 0.00933743 0.00939727 0.00861669] mean value: 0.00940244197845459 key: test_mcc value: [0.47727273 0.82575758 0.56490196 0.30240737 0.44411739 0.66414149 0.39393939 0.66414149 0.29277002 0.46225016] mean value: 0.5091699576165252 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73913043 0.91304348 0.7826087 0.65217391 0.69565217 0.82608696 0.69565217 0.82608696 0.63636364 0.72727273] mean value: 0.7494071146245059 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.72727273 0.90909091 0.76190476 0.6 0.63157895 0.81818182 0.69565217 0.81818182 0.55555556 0.7 ] mean value: 0.7217418711469055 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.72727273 0.90909091 0.8 0.66666667 0.85714286 0.9 0.72727273 0.9 0.71428571 0.77777778] mean value: 0.797950937950938 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.72727273 0.90909091 0.72727273 0.54545455 0.5 0.75 0.66666667 0.75 0.45454545 0.63636364] mean value: 0.6666666666666666 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.73863636 0.91287879 0.78030303 0.64772727 0.70454545 0.82954545 0.6969697 0.82954545 0.63636364 0.72727273] mean value: 0.7503787878787879 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.57142857 0.83333333 0.61538462 0.42857143 0.46153846 0.69230769 0.53333333 0.69230769 0.38461538 0.53846154] mean value: 0.5751282051282052 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.13 Accuracy on Blind test: 0.56 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.31818891 1.31016517 1.28691626 1.37142372 1.3797493 1.26361561 1.31167197 1.2901237 1.27162528 1.29103398] mean value: 1.3094513893127442 key: score_time value: [0.09341335 0.08867025 0.09690428 0.09699655 0.09698176 0.08863115 0.09124899 0.0938971 0.09380794 0.09228921] mean value: 0.09328405857086182 key: test_mcc value: [0.58002308 0.91666667 0.91605722 0.47727273 0.76764947 0.65151515 0.91605722 0.91666667 0.81818182 0.83205029] mean value: 0.7792140323001392 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.7826087 0.95652174 0.95652174 0.73913043 0.86956522 0.82608696 0.95652174 0.95652174 0.90909091 0.90909091] mean value: 0.8861660079051383 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.73684211 0.95652174 0.95238095 0.72727273 0.85714286 0.83333333 0.96 0.95652174 0.90909091 0.9 ] mean value: 0.8789106362744806 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.875 0.91666667 1. 0.72727273 1. 0.83333333 0.92307692 1. 0.90909091 1. ] mean value: 0.918444055944056 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.63636364 1. 0.90909091 0.72727273 0.75 0.83333333 1. 0.91666667 0.90909091 0.81818182] mean value: 0.85 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.77651515 0.95833333 0.95454545 0.73863636 0.875 0.82575758 0.95454545 0.95833333 0.90909091 0.90909091] mean value: 0.8859848484848485 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.58333333 0.91666667 0.90909091 0.57142857 0.75 0.71428571 0.92307692 0.91666667 0.83333333 0.81818182] mean value: 0.7936063936063936 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.28 Accuracy on Blind test: 0.62 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.92155814 0.86642456 0.9216938 0.91805387 0.88870335 0.92368603 0.926301 0.83419299 0.89840913 0.83216953] mean value: 0.8931192398071289 key: score_time value: [0.24260831 0.20261288 0.24647403 0.1979568 0.2496202 0.20900178 0.22739148 0.21628428 0.12800908 0.18758345] mean value: 0.21075422763824464 key: test_mcc value: [0.56490196 0.83971912 0.82575758 0.47727273 0.74242424 0.66414149 0.65909298 0.65151515 0.64715023 0.63636364] mean value: 0.6708339110699807 key: train_mcc value: [0.96097468 0.9516192 0.96170013 0.98048734 0.9707786 0.9707786 0.95163291 0.94219063 0.94245853 0.9613463 ] mean value: 0.9593966922193641 key: test_accuracy value: [0.7826087 0.91304348 0.91304348 0.73913043 0.86956522 0.82608696 0.82608696 0.82608696 0.81818182 0.81818182] mean value: 0.833201581027668 key: train_accuracy value: [0.9804878 0.97560976 0.9804878 0.9902439 0.98536585 0.98536585 0.97560976 0.97073171 0.97087379 0.98058252] mean value: 0.9795358749704002 key: test_fscore value: [0.76190476 0.91666667 0.90909091 0.72727273 0.86956522 0.81818182 0.84615385 0.83333333 0.83333333 0.81818182] mean value: 0.8333684431510519 key: train_fscore value: [0.98058252 0.97607656 0.98095238 0.99029126 0.98536585 0.98536585 0.97584541 0.97115385 0.97142857 0.98076923] mean value: 0.9797831488680813 key: test_precision value: [0.8 0.84615385 0.90909091 0.72727273 0.90909091 0.9 0.78571429 0.83333333 0.76923077 0.81818182] mean value: 0.8298068598068599 key: train_precision /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( value: [0.98058252 0.96226415 0.96261682 0.99029126 0.98058252 0.98058252 0.96190476 0.95283019 0.95327103 0.97142857] mean value: 0.9696354358374721 key: test_recall value: [0.72727273 1. 0.90909091 0.72727273 0.83333333 0.75 0.91666667 0.83333333 0.90909091 0.81818182] mean value: 0.8424242424242424 key: train_recall value: [0.98058252 0.99029126 1. 0.99029126 0.99019608 0.99019608 0.99019608 0.99019608 0.99029126 0.99029126] mean value: 0.9902531886541024 key: test_roc_auc value: [0.78030303 0.91666667 0.91287879 0.73863636 0.87121212 0.82954545 0.8219697 0.82575758 0.81818182 0.81818182] mean value: 0.8333333333333334 key: train_roc_auc value: [0.98048734 0.97553779 0.98039216 0.99024367 0.9853893 0.9853893 0.97568056 0.97082619 0.97087379 0.98058252] mean value: 0.9795402627070247 key: test_jcc value: [0.61538462 0.84615385 0.83333333 0.57142857 0.76923077 0.69230769 0.73333333 0.71428571 0.71428571 0.69230769] mean value: 0.7182051282051282 key: train_jcc value: [0.96190476 0.95327103 0.96261682 0.98076923 0.97115385 0.97115385 0.95283019 0.94392523 0.94444444 0.96226415] mean value: 0.9604333553160921 MCC on Blind test: 0.38 Accuracy on Blind test: 0.67 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02152944 0.00897455 0.00889492 0.0088706 0.00894356 0.0089643 0.00907016 0.00900245 0.00898051 0.00887012] mean value: 0.010210061073303222 key: score_time value: [0.01050448 0.00858045 0.00870132 0.00862598 0.00875401 0.00850987 0.00866985 0.00858259 0.00861263 0.00853896] mean value: 0.008808016777038574 key: test_mcc value: [0.21969697 0.55048188 0.22407133 0.21452908 0.3030303 0.3030303 0.33371191 0.39393939 0.09090909 0.32539569] mean value: 0.29587959510446155 key: train_mcc value: [0.44146616 0.44911432 0.45709726 0.49637007 0.4861007 0.48652841 0.43786483 0.44832571 0.49218702 0.50892419] mean value: 0.4703978666309494 key: test_accuracy value: [0.60869565 0.73913043 0.60869565 0.60869565 0.65217391 0.65217391 0.65217391 0.69565217 0.54545455 0.63636364] mean value: 0.6399209486166008 key: train_accuracy value: [0.71707317 0.72195122 0.72682927 0.74634146 0.73658537 0.74146341 0.71707317 0.72195122 0.74271845 0.75242718] mean value: 0.7324413923750888 key: test_fscore value: [0.60869565 0.625 0.47058824 0.52631579 0.66666667 0.66666667 0.6 0.69565217 0.54545455 0.5 ] mean value: 0.5905039729642637 key: train_fscore value: [0.69148936 0.70157068 0.71134021 0.73195876 0.7 0.72251309 0.69473684 0.6984127 0.71957672 0.7357513 ] mean value: 0.7107349655839269 key: test_precision value: [0.58333333 1. 0.66666667 0.625 0.66666667 0.66666667 0.75 0.72727273 0.54545455 0.8 ] mean value: 0.7031060606060606 key: train_precision value: [0.76470588 0.76136364 0.75824176 0.78021978 0.80769231 0.7752809 0.75 0.75862069 0.79069767 0.78888889] mean value: 0.7735711516709494 key: test_recall value: [0.63636364 0.45454545 0.36363636 0.45454545 0.66666667 0.66666667 0.5 0.66666667 0.54545455 0.36363636] mean value: 0.5318181818181817 key: train_recall value: [0.63106796 0.65048544 0.66990291 0.68932039 0.61764706 0.67647059 0.64705882 0.64705882 0.66019417 0.68932039] mean value: 0.657852655625357 key: test_roc_auc value: [0.60984848 0.72727273 0.59848485 0.60227273 0.65151515 0.65151515 0.65909091 0.6969697 0.54545455 0.63636364] mean value: 0.6378787878787878 key: train_roc_auc value: [0.71749476 0.72230154 0.72710832 0.74662098 0.736008 0.74114792 0.7167333 0.72158766 0.74271845 0.75242718] mean value: 0.732414810584428 key: test_jcc value: [0.4375 0.45454545 0.30769231 0.35714286 0.5 0.5 0.42857143 0.53333333 0.375 0.33333333] mean value: 0.42271187146187145 key: train_jcc value: [0.52845528 0.54032258 0.552 0.57723577 0.53846154 0.56557377 0.53225806 0.53658537 0.56198347 0.58196721] mean value: 0.5514843061067994 MCC on Blind test: 0.26 Accuracy on Blind test: 0.63 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.0944972 0.05067468 0.04957175 0.05115271 0.05678248 0.05602765 0.05684161 0.07069731 0.04913449 0.06043005] mean value: 0.05958099365234375 key: score_time value: [0.01044273 0.01050806 0.01055908 0.01056576 0.01026511 0.0102632 0.01027846 0.01120543 0.0102067 0.01039171] mean value: 0.010468626022338867 key: test_mcc value: [0.74047959 1. 0.91605722 0.6992059 0.83971912 0.83971912 0.91605722 0.91666667 1. 0.91287093] mean value: 0.8780775779868542 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 1. 0.95652174 0.82608696 0.91304348 0.91304348 0.95652174 0.95652174 1. 0.95454545] mean value: 0.9345849802371542 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 1. 0.95238095 0.84615385 0.90909091 0.90909091 0.96 0.95652174 1. 0.95238095] mean value: 0.9342762165370861 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.9 1. 1. 0.73333333 1. 1. 0.92307692 1. 1. 1. ] mean value: 0.9556410256410256 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 1. 0.90909091 1. 0.83333333 0.83333333 1. 0.91666667 1. 0.90909091] mean value: 0.921969696969697 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86742424 1. 0.95454545 0.83333333 0.91666667 0.91666667 0.95454545 0.95833333 1. 0.95454545] mean value: 0.9356060606060607 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 1. 0.90909091 0.73333333 0.83333333 0.83333333 0.92307692 0.91666667 1. 0.90909091] mean value: 0.8807925407925408 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.53 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.0328002 0.05474067 0.06001735 0.05796957 0.03843188 0.02583218 0.05591583 0.06007648 0.05593419 0.05865741] mean value: 0.050037574768066403 key: score_time value: [0.02213311 0.02232766 0.02493095 0.0224731 0.01203299 0.01206088 0.02462769 0.0249722 0.02335501 0.01984525] mean value: 0.020875883102416993 key: test_mcc value: [0.56490196 0.58002308 0.91666667 0.47727273 0.5164589 0.48856385 0.56490196 0.58930667 0.63636364 0.45454545] mean value: 0.5789004892930631 key: train_mcc value: [0.91223227 0.96097468 0.91223227 0.93174679 0.97115114 0.95126131 0.95163291 0.93175328 0.94174757 0.94192516] mean value: 0.9406657392104807 key: test_accuracy value: [0.7826087 0.7826087 0.95652174 0.73913043 0.73913043 0.73913043 0.7826087 0.7826087 0.81818182 0.72727273] mean value: 0.7849802371541502 key: train_accuracy value: [0.95609756 0.9804878 0.95609756 0.96585366 0.98536585 0.97560976 0.97560976 0.96585366 0.97087379 0.97087379] mean value: 0.9702723182571631 key: test_fscore value: [0.76190476 0.73684211 0.95652174 0.72727273 0.7 0.72727273 0.8 0.76190476 0.81818182 0.72727273] mean value: 0.7717173368203116 key: train_fscore value: [0.95652174 0.98058252 0.95652174 0.96618357 0.98550725 0.97536946 0.97584541 0.96585366 0.97087379 0.97115385] mean value: 0.9704412983643049 key: test_precision value: [0.8 0.875 0.91666667 0.72727273 0.875 0.8 0.76923077 0.88888889 0.81818182 0.72727273] mean value: 0.8197513597513597 key: train_precision value: [0.95192308 0.98058252 0.95192308 0.96153846 0.97142857 0.98019802 0.96190476 0.96116505 0.97087379 0.96190476] mean value: 0.9653442089647992 key: test_recall value: [0.72727273 0.63636364 1. 0.72727273 0.58333333 0.66666667 0.83333333 0.66666667 0.81818182 0.72727273] mean value: 0.7386363636363636 key: train_recall value: [0.96116505 0.98058252 0.96116505 0.97087379 1. 0.97058824 0.99019608 0.97058824 0.97087379 0.98058252] mean value: 0.975661526746621 key: test_roc_auc value: [0.78030303 0.77651515 0.95833333 0.73863636 0.74621212 0.74242424 0.78030303 0.78787879 0.81818182 0.72727273] mean value: 0.7856060606060605 key: train_roc_auc value: [0.95607272 0.98048734 0.95607272 0.96582905 0.98543689 0.97558538 0.97568056 0.96587664 0.97087379 0.97087379] mean value: 0.970278888254331 key: test_jcc value: [0.61538462 0.58333333 0.91666667 0.57142857 0.53846154 0.57142857 0.66666667 0.61538462 0.69230769 0.57142857] mean value: 0.6342490842490842 key: train_jcc value: [0.91666667 0.96190476 0.91666667 0.93457944 0.97142857 0.95192308 0.95283019 0.93396226 0.94339623 0.94392523] mean value: 0.9427283095732223 MCC on Blind test: 0.13 Accuracy on Blind test: 0.56 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01869988 0.0104475 0.01018763 0.01002216 0.00952911 0.01009011 0.01007676 0.01010776 0.00896454 0.01016378] mean value: 0.010828924179077149 key: score_time value: [0.00920248 0.00986409 0.00962067 0.0095036 0.00955296 0.00947595 0.00958252 0.00949979 0.00944614 0.00946665] mean value: 0.009521484375 key: test_mcc value: [0.06579517 0.47727273 0.56490196 0.21969697 0.22407133 0.39727608 0.56818182 0.38932432 0.54772256 0.46225016] mean value: 0.3916493092720405 key: train_mcc value: [0.48780456 0.42066716 0.48336719 0.46806514 0.42940367 0.42714207 0.40668817 0.42940367 0.41216105 0.42138641] mean value: 0.43860890996498425 key: test_accuracy value: [0.52173913 0.73913043 0.7826087 0.60869565 0.60869565 0.69565217 0.7826087 0.69565217 0.77272727 0.72727273] mean value: 0.6934782608695652 key: train_accuracy value: [0.74146341 0.70731707 0.74146341 0.73170732 0.71219512 0.71219512 0.70243902 0.71219512 0.7038835 0.70873786] mean value: 0.7173596968979399 key: test_fscore value: [0.59259259 0.72727273 0.76190476 0.60869565 0.68965517 0.74074074 0.7826087 0.72 0.7826087 0.75 ] mean value: 0.7156079038402876 key: train_fscore value: [0.760181 0.73214286 0.74881517 0.75113122 0.73059361 0.7255814 0.71361502 0.73059361 0.7239819 0.72727273] mean value: 0.7343908501374309 key: test_precision value: [0.5 0.72727273 0.8 0.58333333 0.58823529 0.66666667 0.81818182 0.69230769 0.75 0.69230769] mean value: 0.6818305224187577 key: train_precision value: [0.71186441 0.67768595 0.73148148 0.70338983 0.68376068 0.69026549 0.68468468 0.68376068 0.6779661 0.68376068] mean value: 0.6928619993570155 key: test_recall value: [0.72727273 0.72727273 0.72727273 0.63636364 0.83333333 0.83333333 0.75 0.75 0.81818182 0.81818182] mean value: 0.7621212121212122 key: train_recall value: [0.81553398 0.7961165 0.76699029 0.80582524 0.78431373 0.76470588 0.74509804 0.78431373 0.77669903 0.77669903] mean value: 0.7816295450218922 key: test_roc_auc value: [0.53030303 0.73863636 0.78030303 0.60984848 0.59848485 0.68939394 0.78409091 0.69318182 0.77272727 0.72727273] mean value: 0.6924242424242424 key: train_roc_auc value: [0.74110032 0.70688178 0.74133828 0.73134399 0.71254521 0.71245003 0.70264611 0.71254521 0.7038835 0.70873786] mean value: 0.7173472301541977 key: test_jcc value: [0.42105263 0.57142857 0.61538462 0.4375 0.52631579 0.58823529 0.64285714 0.5625 0.64285714 0.6 ] mean value: 0.5608131187697751 key: train_jcc value: [0.61313869 0.57746479 0.59848485 0.60144928 0.57553957 0.56934307 0.55474453 0.57553957 0.56737589 0.57142857] mean value: 0.5804508784595866 MCC on Blind test: 0.41 Accuracy on Blind test: 0.7 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01363993 0.01537251 0.01528311 0.01538348 0.01859713 0.01800203 0.01694918 0.01621366 0.01610088 0.01708269] mean value: 0.016262459754943847 key: score_time value: [0.00964856 0.0117321 0.01158285 0.01170731 0.01168036 0.01170659 0.01175737 0.0116291 0.01162767 0.01166821] mean value: 0.011474013328552246 key: test_mcc value: [0.69084928 0.22268089 0.50168817 0.31252706 0.50460839 0.82575758 0.83971912 0.74047959 0.39735971 0.54232614] mean value: 0.5577995920530833 key: train_mcc value: [0.70109302 0.51269395 0.79525817 0.73218681 0.58583388 0.88020643 0.75526392 0.86303792 0.57361333 0.82977382] mean value: 0.7228961254133855 key: test_accuracy value: [0.82608696 0.56521739 0.73913043 0.65217391 0.69565217 0.91304348 0.91304348 0.86956522 0.63636364 0.72727273] mean value: 0.7537549407114624 key: train_accuracy value: [0.82926829 0.70731707 0.89268293 0.84878049 0.75609756 0.93658537 0.86341463 0.92682927 0.74757282 0.90776699] mean value: 0.841631541558134 key: test_fscore value: [0.77777778 0.16666667 0.66666667 0.55555556 0.58823529 0.91666667 0.90909091 0.88 0.42857143 0.625 ] mean value: 0.6514230965113318 key: train_fscore value: [0.79532164 0.5890411 0.88421053 0.82285714 0.67532468 0.93193717 0.84090909 0.93150685 0.66233766 0.89839572] mean value: 0.8031841575076744 key: test_precision value: [1. 1. 0.85714286 0.71428571 1. 0.91666667 1. 0.84615385 1. 1. ] mean value: 0.9334249084249084 key: train_precision value: [1. 1. 0.96551724 1. 1. 1. 1. 0.87179487 1. 1. ] mean value: 0.9837312113174183 key: test_recall value: [0.63636364 0.09090909 0.54545455 0.45454545 0.41666667 0.91666667 0.83333333 0.91666667 0.27272727 0.45454545] mean value: 0.5537878787878788 key: train_recall value: [0.66019417 0.41747573 0.81553398 0.69902913 0.50980392 0.87254902 0.7254902 1. 0.49514563 0.81553398] mean value: 0.7010755758614126 key: test_roc_auc value: [0.81818182 0.54545455 0.73106061 0.64393939 0.70833333 0.91287879 0.91666667 0.86742424 0.63636364 0.72727273] mean value: 0.7507575757575757 key: train_roc_auc value: [0.83009709 0.70873786 0.89306111 0.84951456 0.75490196 0.93627451 0.8627451 0.92718447 0.74757282 0.90776699] mean value: 0.841785646297354 key: test_jcc value: [0.63636364 0.09090909 0.5 0.38461538 0.41666667 0.84615385 0.83333333 0.78571429 0.27272727 0.45454545] mean value: 0.5221028971028971 key: train_jcc value: [0.66019417 0.41747573 0.79245283 0.69902913 0.50980392 0.87254902 0.7254902 0.87179487 0.49514563 0.81553398] mean value: 0.6859469480015152 MCC on Blind test: 0.18 Accuracy on Blind test: 0.59 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01696992 0.01531029 0.01493359 0.01531291 0.01478624 0.01416588 0.01495624 0.01490998 0.01478863 0.01566005] mean value: 0.01517937183380127 key: score_time value: [0.01180148 0.01169944 0.01167941 0.01164746 0.01165843 0.01172638 0.01163912 0.01172733 0.01157904 0.01167846] mean value: 0.01168365478515625 key: test_mcc value: [0.39393939 0.6992059 0.32232919 0.56879646 0.76764947 0.82575758 0.76764947 0.91666667 0.64715023 0.23570226] mean value: 0.6144846616230837 key: train_mcc value: [0.87817847 0.81217608 0.3623663 0.70796649 0.92194936 0.86485629 0.66933669 0.8742382 0.85045167 0.56613852] mean value: 0.7507658057776959 key: test_accuracy value: [0.69565217 0.82608696 0.60869565 0.73913043 0.86956522 0.91304348 0.86956522 0.95652174 0.81818182 0.59090909] mean value: 0.7887351778656126 key: train_accuracy value: [0.93658537 0.89756098 0.61463415 0.83414634 0.96097561 0.93170732 0.8097561 0.93658537 0.9223301 0.74271845] mean value: 0.8586999763201516 key: test_fscore value: [0.69565217 0.84615385 0.30769231 0.78571429 0.85714286 0.91666667 0.85714286 0.95652174 0.8 0.68965517] mean value: 0.7712341905970092 key: train_fscore value: [0.94009217 0.90748899 0.37795276 0.85833333 0.96078431 0.92929293 0.76363636 0.93779904 0.91752577 0.7953668 ] mean value: 0.8388272460201259 key: test_precision value: [0.66666667 0.73333333 1. 0.64705882 1. 0.91666667 1. 1. 0.88888889 0.55555556] mean value: 0.8408169934640523 key: train_precision value: [0.89473684 0.83064516 1. 0.75182482 0.96078431 0.95833333 1. 0.91588785 0.97802198 0.66025641] mean value: 0.8950490706718336 key: test_recall value: [0.72727273 1. 0.18181818 1. 0.75 0.91666667 0.75 0.91666667 0.72727273 0.90909091] mean value: 0.7878787878787878 key: train_recall value: [0.99029126 1. 0.23300971 1. 0.96078431 0.90196078 0.61764706 0.96078431 0.86407767 1. ] mean value: 0.8528555111364935 key: test_roc_auc value: [0.6969697 0.83333333 0.59090909 0.75 0.875 0.91287879 0.875 0.95833333 0.81818182 0.59090909] mean value: 0.7901515151515152 key: train_roc_auc value: [0.9363221 0.89705882 0.61650485 0.83333333 0.96097468 0.93156292 0.80882353 0.93670284 0.9223301 0.74271845] mean value: 0.8586331620026652 key: test_jcc value: [0.53333333 0.73333333 0.18181818 0.64705882 0.75 0.84615385 0.75 0.91666667 0.66666667 0.52631579] mean value: 0.6551346640975124 key: train_jcc value: [0.88695652 0.83064516 0.23300971 0.75182482 0.9245283 0.86792453 0.61764706 0.88288288 0.84761905 0.66025641] mean value: 0.7503294439056115 MCC on Blind test: 0.2 Accuracy on Blind test: 0.6 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.12839508 0.113796 0.11646557 0.11455917 0.1184082 0.11943507 0.11837411 0.11078691 0.11046553 0.11171436] mean value: 0.11624000072479249 key: score_time value: [0.01480055 0.01611924 0.01634765 0.01499295 0.01620007 0.01611018 0.01495361 0.0148952 0.01476741 0.01726556] mean value: 0.01564524173736572 key: test_mcc value: [0.74047959 0.82575758 0.91605722 0.66414149 0.83971912 0.91666667 0.91605722 0.83971912 0.81818182 0.91287093] mean value: 0.8389650763028634 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86956522 0.91304348 0.95652174 0.82608696 0.91304348 0.95652174 0.95652174 0.91304348 0.90909091 0.95454545] mean value: 0.916798418972332 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 0.90909091 0.95238095 0.83333333 0.90909091 0.95652174 0.96 0.90909091 0.90909091 0.95238095] mean value: 0.9148123470732166 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.9 0.90909091 1. 0.76923077 1. 1. 0.92307692 1. 0.90909091 1. ] mean value: 0.941048951048951 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.81818182 0.90909091 0.90909091 0.90909091 0.83333333 0.91666667 1. 0.83333333 0.90909091 0.90909091] mean value: 0.8946969696969697 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.86742424 0.91287879 0.95454545 0.82954545 0.91666667 0.95833333 0.95454545 0.91666667 0.90909091 0.95454545] mean value: 0.9174242424242425 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 0.83333333 0.90909091 0.71428571 0.83333333 0.91666667 0.92307692 0.83333333 0.83333333 0.90909091] mean value: 0.8455544455544456 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.01 Accuracy on Blind test: 0.5 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.04284787 0.04148149 0.05290413 0.04513884 0.03626966 0.05176854 0.04080129 0.04704714 0.04275584 0.04380989] mean value: 0.04448246955871582 key: score_time value: [0.01655555 0.02902532 0.01787877 0.02407384 0.01867771 0.02897787 0.01784182 0.03749013 0.01835752 0.02550364] mean value: 0.023438215255737305 key: test_mcc value: [0.74047959 0.83743579 0.91605722 0.58930667 0.76764947 0.83971912 0.91605722 1. 1. 0.81818182] mean value: 0.8424886910191745 key: train_mcc value: [0.98067587 0.98067587 1. 1. 1. 1. 0.99029126 0.99029034 0.99033794 0.99033794] mean value: 0.9922609226032173 key: test_accuracy value: [0.86956522 0.91304348 0.95652174 0.7826087 0.86956522 0.91304348 0.95652174 1. 1. 0.90909091] mean value: 0.9169960474308301 key: train_accuracy value: [0.9902439 0.9902439 1. 1. 1. 1. 0.99512195 0.99512195 0.99514563 0.99514563] mean value: 0.9961022969452995 key: test_fscore value: [0.85714286 0.9 0.95238095 0.8 0.85714286 0.90909091 0.96 1. 1. 0.90909091] mean value: 0.9144848484848485 key: train_fscore value: [0.99019608 0.99019608 1. 1. 1. 1. 0.99512195 0.99507389 0.99516908 0.99516908] mean value: 0.9960926163959081 key: test_precision value: [0.9 1. 1. 0.71428571 1. 1. 0.92307692 1. 1. 0.90909091] mean value: 0.9446453546453546 key: train_precision value: [1. 1. 1. 1. 1. 1. 0.99029126 1. 0.99038462 0.99038462] mean value: 0.9971060492905153 key: test_recall value: [0.81818182 0.81818182 0.90909091 0.90909091 0.75 0.83333333 1. 1. 1. 0.90909091] mean value: 0.8946969696969697 key: train_recall value: [0.98058252 0.98058252 1. 1. 1. 1. 1. 0.99019608 1. 1. ] mean value: 0.9951361126975062 key: test_roc_auc value: [0.86742424 0.90909091 0.95454545 0.78787879 0.875 0.91666667 0.95454545 1. 1. 0.90909091] mean value: 0.9174242424242425 key: train_roc_auc value: [0.99029126 0.99029126 1. 1. 1. 1. 0.99514563 0.99509804 0.99514563 0.99514563] mean value: 0.9961117456691414 key: test_jcc value: [0.75 0.81818182 0.90909091 0.66666667 0.75 0.83333333 0.92307692 1. 1. 0.83333333] mean value: 0.8483682983682984 key: train_jcc value: [0.98058252 0.98058252 1. 1. 1. 1. 0.99029126 0.99019608 0.99038462 0.99038462] mean value: 0.9922421619880215 MCC on Blind test: 0.13 Accuracy on Blind test: 0.55 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.04829431 0.08197236 0.07691717 0.05902553 0.02872419 0.02841234 0.06640029 0.04778814 0.02781248 0.0365293 ] mean value: 0.05018761157989502 key: score_time value: [0.02258968 0.02218485 0.02187991 0.01301789 0.01300526 0.01682043 0.01924825 0.01270652 0.01270413 0.02137733] mean value: 0.017553424835205077 key: test_mcc value: [0.3030303 0.83743579 0.31252706 0.12406456 0.56818182 0.47727273 0.41096386 0.82575758 0.48795004 0.2773501 ] mean value: 0.462453382427294 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.65217391 0.91304348 0.65217391 0.56521739 0.7826087 0.73913043 0.69565217 0.91304348 0.72727273 0.63636364] mean value: 0.7276679841897233 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.63636364 0.9 0.55555556 0.5 0.7826087 0.75 0.66666667 0.91666667 0.66666667 0.6 ] mean value: 0.6974527887571366 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.63636364 1. 0.71428571 0.55555556 0.81818182 0.75 0.77777778 0.91666667 0.85714286 0.66666667] mean value: 0.7692640692640693 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.63636364 0.81818182 0.45454545 0.45454545 0.75 0.75 0.58333333 0.91666667 0.54545455 0.54545455] mean value: 0.6454545454545455 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.65151515 0.90909091 0.64393939 0.56060606 0.78409091 0.73863636 0.70075758 0.91287879 0.72727273 0.63636364] mean value: 0.7265151515151516 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.46666667 0.81818182 0.38461538 0.33333333 0.64285714 0.6 0.5 0.84615385 0.5 0.42857143] mean value: 0.5520379620379621 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.22 Accuracy on Blind test: 0.61 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.37749887 0.35624838 0.34953141 0.35143161 0.35564804 0.35179186 0.34444571 0.35502386 0.34801006 0.35423851] mean value: 0.35438683032989504 key: score_time value: [0.00946093 0.00907135 0.00899053 0.00892138 0.00922036 0.00899267 0.00900412 0.0091064 0.00907969 0.0090704 ] mean value: 0.009091782569885253 key: test_mcc value: [0.91666667 1. 0.91605722 0.6992059 0.76764947 1. 0.91605722 1. 1. 0.91287093] mean value: 0.912850741785816 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95652174 1. 0.95652174 0.82608696 0.86956522 1. 0.95652174 1. 1. 0.95454545] mean value: 0.9519762845849803 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95652174 1. 0.95238095 0.84615385 0.85714286 1. 0.96 1. 1. 0.95238095] mean value: 0.9524580347189042 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.91666667 1. 1. 0.73333333 1. 1. 0.92307692 1. 1. 1. ] mean value: 0.9573076923076923 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 0.90909091 1. 0.75 1. 1. 1. 1. 0.90909091] mean value: 0.9568181818181818 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95833333 1. 0.95454545 0.83333333 0.875 1. 0.95454545 1. 1. 0.95454545] mean value: 0.953030303030303 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.91666667 1. 0.90909091 0.73333333 0.75 1. 0.92307692 1. 1. 0.90909091] mean value: 0.9141258741258741 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.13 Accuracy on Blind test: 0.54 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.01773167 0.01976418 0.03433776 0.01997304 0.01990628 0.02007365 0.02028775 0.0202179 0.02021074 0.0203495 ] mean value: 0.021285247802734376 key: score_time value: [0.01196408 0.014189 0.01221085 0.01400971 0.01760888 0.02596092 0.01817083 0.01999259 0.0198195 0.0188601 ] mean value: 0.017278647422790526 key: test_mcc value: [0.63327851 0.83971912 0.76764947 0.43929769 0.76277007 0.62050523 0.62050523 0.83743579 0.68313005 0.61237244] mean value: 0.6816663591347039 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.7826087 0.91304348 0.86956522 0.65217391 0.86956522 0.7826087 0.7826087 0.91304348 0.81818182 0.77272727] mean value: 0.8156126482213438 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.81481481 0.91666667 0.88 0.73333333 0.88888889 0.82758621 0.82758621 0.92307692 0.84615385 0.81481481] mean value: 0.8472921701542391 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.6875 0.84615385 0.78571429 0.57894737 0.8 0.70588235 0.70588235 0.85714286 0.73333333 0.6875 ] mean value: 0.7388056396647728 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.79166667 0.91666667 0.875 0.66666667 0.86363636 0.77272727 0.77272727 0.90909091 0.81818182 0.77272727] mean value: 0.8159090909090909 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.6875 0.84615385 0.78571429 0.57894737 0.8 0.70588235 0.70588235 0.85714286 0.73333333 0.6875 ] mean value: 0.7388056396647728 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.06 Accuracy on Blind test: 0.51 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.0230031 0.03526139 0.03538561 0.02961564 0.03531909 0.03540421 0.03581238 0.03637147 0.03165317 0.03724432] mean value: 0.033507037162780764 key: score_time value: [0.01678348 0.02226853 0.02215648 0.02117872 0.02216363 0.02219868 0.02224374 0.02224016 0.02391219 0.02239656] mean value: 0.02175421714782715 key: test_mcc value: [0.82575758 0.74242424 0.65909298 0.65151515 0.76764947 0.74047959 0.82575758 0.82575758 0.73029674 0.46225016] mean value: 0.7230981074087546 key: train_mcc value: [0.92263761 0.90259929 0.93209539 0.93209539 0.95236324 0.92213232 0.92213232 0.903143 0.92250402 0.92302639] mean value: 0.9234728990174977 key: test_accuracy value: [0.91304348 0.86956522 0.82608696 0.82608696 0.86956522 0.86956522 0.91304348 0.91304348 0.86363636 0.72727273] mean value: 0.8590909090909091 key: train_accuracy value: [0.96097561 0.95121951 0.96585366 0.96585366 0.97560976 0.96097561 0.96097561 0.95121951 0.96116505 0.96116505] mean value: 0.9615013023916646 key: test_fscore value: [0.90909091 0.86956522 0.8 0.81818182 0.85714286 0.88 0.91666667 0.91666667 0.86956522 0.7 ] mean value: 0.8536879352531526 key: train_fscore value: [0.96190476 0.95192308 0.96650718 0.96650718 0.97607656 0.96116505 0.96116505 0.95192308 0.96153846 0.96190476] mean value: 0.9620615145372428 key: test_precision value: [0.90909091 0.83333333 0.88888889 0.81818182 1. 0.84615385 0.91666667 0.91666667 0.83333333 0.77777778] mean value: 0.874009324009324 key: train_precision value: [0.94392523 0.94285714 0.95283019 0.95283019 0.95327103 0.95192308 0.95192308 0.93396226 0.95238095 0.94392523] mean value: 0.9479828385920785 key: test_recall value: [0.90909091 0.90909091 0.72727273 0.81818182 0.75 0.91666667 0.91666667 0.91666667 0.90909091 0.63636364] mean value: 0.8409090909090909 key: train_recall value: [0.98058252 0.96116505 0.98058252 0.98058252 1. 0.97058824 0.97058824 0.97058824 0.97087379 0.98058252] /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:195: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_orig.py:198: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) mean value: 0.9766133637921188 key: test_roc_auc value: [0.91287879 0.87121212 0.8219697 0.82575758 0.875 0.86742424 0.91287879 0.91287879 0.86363636 0.72727273] mean value: 0.859090909090909 key: train_roc_auc value: [0.9608795 0.95117076 0.96578146 0.96578146 0.97572816 0.96102227 0.96102227 0.95131354 0.96116505 0.96116505] mean value: 0.961502950694841 key: test_jcc value: [0.83333333 0.76923077 0.66666667 0.69230769 0.75 0.78571429 0.84615385 0.84615385 0.76923077 0.53846154] mean value: 0.7497252747252747 key: train_jcc value: [0.9266055 0.90825688 0.93518519 0.93518519 0.95327103 0.92523364 0.92523364 0.90825688 0.92592593 0.9266055 ] mean value: 0.9269759384695507 MCC on Blind test: 0.21 Accuracy on Blind test: 0.6 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.31841111 0.24217796 0.23903775 0.24415827 0.23731065 0.22925878 0.26342988 0.2629621 0.31826568 0.26236701] mean value: 0.26173791885375974 key: score_time value: [0.02236819 0.02346969 0.02272415 0.02223444 0.0223434 0.02419353 0.02253866 0.025352 0.02559161 0.02376676] mean value: 0.023458242416381836 key: test_mcc value: [0.65151515 0.56490196 0.65909298 0.65151515 0.66414149 0.74047959 0.74242424 0.82575758 0.63636364 0.36514837] mean value: 0.6501340144993133 key: train_mcc value: [0.92211753 0.92263761 0.93209539 0.93209539 0.9707786 0.92213232 0.92213232 0.903143 0.95150116 0.94192516] mean value: 0.932055849012585 key: test_accuracy value: [0.82608696 0.7826087 0.82608696 0.82608696 0.82608696 0.86956522 0.86956522 0.91304348 0.81818182 0.68181818] mean value: 0.8239130434782609 key: train_accuracy value: [0.96097561 0.96097561 0.96585366 0.96585366 0.98536585 0.96097561 0.96097561 0.95121951 0.97572816 0.97087379] mean value: 0.9658797063698792 key: test_fscore value: [0.81818182 0.76190476 0.8 0.81818182 0.81818182 0.88 0.86956522 0.91666667 0.81818182 0.66666667] mean value: 0.8167530585356673 key: train_fscore value: [0.96153846 0.96190476 0.96650718 0.96650718 0.98536585 0.96116505 0.96116505 0.95192308 0.97584541 0.97115385] mean value: 0.9663075861961067 key: test_precision value: [0.81818182 0.8 0.88888889 0.81818182 0.9 0.84615385 0.90909091 0.91666667 0.81818182 0.7 ] mean value: 0.8415345765345765 key: train_precision value: [0.95238095 0.94392523 0.95283019 0.95283019 0.98058252 0.95192308 0.95192308 0.93396226 0.97115385 0.96190476] mean value: 0.9553416113711852 key: test_recall value: [0.81818182 0.72727273 0.72727273 0.81818182 0.75 0.91666667 0.83333333 0.91666667 0.81818182 0.63636364] mean value: 0.7962121212121213 key: train_recall value: [0.97087379 0.98058252 0.98058252 0.98058252 0.99019608 0.97058824 0.97058824 0.97058824 0.98058252 0.98058252] mean value: 0.9775747192080716 key: test_roc_auc value: [0.82575758 0.78030303 0.8219697 0.82575758 0.82954545 0.86742424 0.87121212 0.91287879 0.81818182 0.68181818] mean value: 0.8234848484848485 key: train_roc_auc value: [0.96092709 0.9608795 0.96578146 0.96578146 0.9853893 0.96102227 0.96102227 0.95131354 0.97572816 0.97087379] mean value: 0.965871882733676 key: test_jcc value: [0.69230769 0.61538462 0.66666667 0.69230769 0.69230769 0.78571429 0.76923077 0.84615385 0.69230769 0.5 ] mean value: 0.6952380952380952 key: train_jcc value: [0.92592593 0.9266055 0.93518519 0.93518519 0.97115385 0.92523364 0.92523364 0.90825688 0.95283019 0.94392523] mean value: 0.9349535239814974 MCC on Blind test: 0.11 Accuracy on Blind test: 0.55