/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data_8020.py:549: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 424 PASS: my_features_df and aa_df successfully combined nrows: 424 ncols: 265 count of NULL values before imputation or_mychisq 102 log10_or_mychisq 102 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML Total no. of features for aaindex: 123 No. of numerical features: 166 No. of categorical features: 7 PASS: x_features has no target variable No. of columns for x_features: 173 ------------------------------------------------------------- Successfully split data with stratification: 80/20 Train data size: (148, 173) Test data size: (37, 173) y_train numbers: Counter({1: 91, 0: 57}) y_train ratio: 0.6263736263736264 y_test_numbers: Counter({1: 23, 0: 14}) y_test ratio: 0.6086956521739131 ------------------------------------------------------------- Simple Random OverSampling Counter({0: 91, 1: 91}) (182, 173) Simple Random UnderSampling Counter({0: 57, 1: 57}) (114, 173) Simple Combined Over and UnderSampling Counter({0: 91, 1: 91}) (182, 173) SMOTE_NC OverSampling Counter({0: 91, 1: 91}) (182, 173) ##################################################################### Running ML analysis: 80/20 split Gene name: pncA Drug name: pyrazinamide Output directory: /home/tanu/git/Data/pyrazinamide/output/ml/tts_8020/ Sanity checks: ML source data size: (185, 173) Total input features: (148, 173) Target feature numbers: Counter({1: 91, 0: 57}) Target features ratio: 0.6263736263736264 ##################################################################### ================================================================ Strucutral features (n): 34 These are: Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts'] FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss'] Other struc columns: ['rsa', 'kd_values', 'rd_values'] ================================================================ AAindex features (n): 123 ================================================================ Evolutionary features (n): 3 These are: ['consurf_score', 'snap2_score', 'provean_score'] ================================================================ Genomic features (n): 6 These are: ['maf', 'logorI'] ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'] ================================================================ Categorical features (n): 7 These are: ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'] ================================================================ Pass: No. of features match ##################################################################### Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03000379 0.02730155 0.02838016 0.02717137 0.03072381 0.03129315 0.03094292 0.03128314 0.03002071 0.0396781 ] mean value: 0.030679869651794433 key: score_time value: [0.01214504 0.01164198 0.01170969 0.01179552 0.01176286 0.01186895 0.01166534 0.01296544 0.01171398 0.01177335] mean value: 0.011904215812683106 key: test_mcc value: [0.43082022 0.27216553 0.27216553 0.38888889 0.43082022 0.43082022 0.28867513 0. 0.70064905 0.54772256] mean value: 0.376272733996905 key: train_mcc value: [0.84034551 0.87406606 0.74333704 0.79198044 0.82449074 0.84138381 0.85700105 0.79479796 0.77889634 0.84234132] mean value: 0.8188640257068358 key: test_accuracy value: [0.73333333 0.66666667 0.66666667 0.66666667 0.73333333 0.73333333 0.66666667 0.53333333 0.85714286 0.78571429] mean value: 0.7042857142857143 key: train_accuracy value: [0.92481203 0.93984962 0.87969925 0.90225564 0.91729323 0.92481203 0.93233083 0.90225564 0.89552239 0.92537313] mean value: 0.9144203793064751 key: test_fscore value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.8 0.76190476 0.76190476 0.66666667 0.8 0.8 0.73684211 0.63157895 0.9 0.85714286] mean value: 0.7716040100250626 key: train_fscore value: [0.94047619 0.95294118 0.90588235 0.92307692 0.93491124 0.94117647 0.94674556 0.92307692 0.91764706 0.94047619] mean value: 0.9326410090663484 key: test_precision value: [0.72727273 0.66666667 0.66666667 0.83333333 0.72727273 0.72727273 0.7 0.66666667 0.81818182 0.75 ] mean value: 0.7283333333333333 key: train_precision value: [0.91860465 0.92045455 0.875 0.89655172 0.90804598 0.90909091 0.91954023 0.88636364 0.88636364 0.91860465] mean value: 0.9038619960632791 key: test_recall value: [0.88888889 0.88888889 0.88888889 0.55555556 0.88888889 0.88888889 0.77777778 0.6 1. 1. ] mean value: 0.8377777777777777 key: train_recall value: [0.96341463 0.98780488 0.93902439 0.95121951 0.96341463 0.97560976 0.97560976 0.96296296 0.95121951 0.96341463] mean value: 0.9633694670280035 key: test_roc_auc value: [0.69444444 0.61111111 0.61111111 0.69444444 0.69444444 0.69444444 0.63888889 0.5 0.8 0.7 ] mean value: 0.6638888888888889 key: train_roc_auc value: [0.91307987 0.92527499 0.86166906 0.88737446 0.90327594 0.90937351 0.91917743 0.88532764 0.87945591 0.91439962] mean value: 0.8998408421112869 key: test_jcc value: [0.66666667 0.61538462 0.61538462 0.5 0.66666667 0.66666667 0.58333333 0.46153846 0.81818182 0.75 ] mean value: 0.6343822843822844 key: train_jcc value: [0.88764045 0.91011236 0.82795699 0.85714286 0.87777778 0.88888889 0.8988764 0.85714286 0.84782609 0.88764045] mean value: 0.8741005120077563 MCC on Blind test: 0.54 Accuracy on Blind test: 0.78 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.6358583 0.63472962 0.77536225 0.86090446 0.62058043 0.62573814 0.72684979 1.00338984 0.77752471 1.01283073] mean value: 0.7673768281936646 key: score_time value: [0.01333785 0.01501322 0.01294947 0.01333427 0.01240039 0.01354885 0.01353598 0.01332974 0.01322675 0.01212215] mean value: 0.013279867172241212 key: test_mcc value: [0.28867513 0.28867513 0.16666667 0.49099025 0.73854895 0.6000992 0.44444444 0.28867513 0.86066297 0.54772256] mean value: 0.4715160435280545 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.66666667 0.66666667 0.6 0.73333333 0.86666667 0.8 0.73333333 0.6 0.92857143 0.78571429] mean value: 0.7380952380952381 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.73684211 0.73684211 0.66666667 0.75 0.9 0.82352941 0.77777778 0.625 0.94117647 0.85714286] mean value: 0.7814977394466558 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.7 0.7 0.66666667 0.85714286 0.81818182 0.875 0.77777778 0.83333333 1. 0.75 ] mean value: 0.7978102453102454 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.77777778 0.77777778 0.66666667 0.66666667 1. 0.77777778 0.77777778 0.5 0.88888889 1. ] mean value: 0.7833333333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.63888889 0.63888889 0.58333333 0.75 0.83333333 0.80555556 0.72222222 0.65 0.94444444 0.7 ] mean value: 0.7266666666666667 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.58333333 0.58333333 0.5 0.6 0.81818182 0.7 0.63636364 0.45454545 0.88888889 0.75 ] mean value: 0.6514646464646464 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.65 Accuracy on Blind test: 0.84 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01237035 0.01113296 0.00974202 0.00966811 0.0096302 0.00947905 0.00960612 0.00956416 0.0093956 0.00943828] mean value: 0.010002684593200684 key: score_time value: [0.01165819 0.00971031 0.00957322 0.00939584 0.00926685 0.00922227 0.0092895 0.00912857 0.00913 0.00887775] mean value: 0.009525251388549805 key: test_mcc value: [ 0.43082022 0. 0.27216553 0.12309149 0.05455447 0.61237244 0.08006408 -0.18898224 0.33734954 0. ] mean value: 0.1721435527505622 key: train_mcc value: [0.54501213 0.42534 0.56649197 0.40644472 0.42721465 0.36528121 0.44773865 0.43164105 0.38378759 0.4226252 ] mean value: 0.4421577185051267 key: test_accuracy value: [0.73333333 0.6 0.66666667 0.6 0.53333333 0.8 0.6 0.6 0.71428571 0.64285714] mean value: 0.6490476190476191 key: train_accuracy value: [0.78947368 0.73684211 0.79699248 0.72932331 0.73684211 0.69924812 0.7443609 0.72180451 0.71641791 0.7238806 ] mean value: 0.7395185725507799 key: test_fscore value: [0.8 0.75 0.76190476 0.7 0.58823529 0.85714286 0.72727273 0.75 0.8 0.7826087 ] mean value: 0.7517164336090167 key: train_fscore value: [0.84090909 0.81081081 0.83832335 0.80434783 0.81283422 0.8019802 0.81914894 0.81218274 0.8 0.81218274] mean value: 0.8152719922122719 key: test_precision value: [0.72727273 0.6 0.66666667 0.63636364 0.625 0.75 0.61538462 0.64285714 0.72727273 0.64285714] mean value: 0.6633674658674659 key: train_precision value: [0.78723404 0.72815534 0.82352941 0.7254902 0.72380952 0.675 0.72641509 0.68965517 0.7037037 0.69565217] mean value: 0.7278644658381841 key: test_recall value: [0.88888889 1. 0.88888889 0.77777778 0.55555556 1. 0.88888889 0.9 0.88888889 1. ] mean value: 0.8788888888888888 key: train_recall value: [0.90243902 0.91463415 0.85365854 0.90243902 0.92682927 0.98780488 0.93902439 0.98765432 0.92682927 0.97560976] mean value: 0.9316922613670581 key: test_roc_auc value: [0.69444444 0.5 0.61111111 0.55555556 0.52777778 0.75 0.52777778 0.45 0.64444444 0.5 ] mean value: 0.5761111111111111 key: train_roc_auc value: [0.75514108 0.68280727 0.77977044 0.67670971 0.67910091 0.6115495 0.68519847 0.64767331 0.65572233 0.65126642] mean value: 0.6824939436548714 key: test_jcc value: [0.66666667 0.6 0.61538462 0.53846154 0.41666667 0.75 0.57142857 0.6 0.66666667 0.64285714] mean value: 0.6068131868131869 key: train_jcc value: [0.7254902 0.68181818 0.72164948 0.67272727 0.68468468 0.66942149 0.69369369 0.68376068 0.66666667 0.68376068] mean value: 0.6883673035329687 MCC on Blind test: 0.26 Accuracy on Blind test: 0.68 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00984406 0.00968909 0.00974703 0.00958514 0.00977182 0.00971794 0.0096283 0.00964093 0.0096209 0.00963807] mean value: 0.009688329696655274 key: score_time value: [0.00931573 0.00920796 0.00928688 0.00928712 0.0091939 0.00921798 0.00923729 0.00919628 0.00921893 0.00846362] mean value: 0.009162569046020507 key: test_mcc value: [ 0.44444444 0.32732684 0. 0. 0. 0.57735027 -0.28867513 0. 0.06666667 0.33734954] mean value: 0.14644626235299057 key: train_mcc value: [0.43769978 0.45554586 0.45215696 0.49115256 0.4455592 0.44919673 0.49718111 0.42789983 0.44816116 0.45628689] mean value: 0.4560840076371738 key: test_accuracy value: [0.73333333 0.66666667 0.53333333 0.46666667 0.53333333 0.8 0.4 0.53333333 0.57142857 0.71428571] mean value: 0.5952380952380952 key: train_accuracy value: [0.73684211 0.7443609 0.7443609 0.7593985 0.73684211 0.7443609 0.76691729 0.72932331 0.73880597 0.73880597] mean value: 0.7440017955336101 key: test_fscore value: [0.77777778 0.70588235 0.63157895 0.42857143 0.63157895 0.84210526 0.52631579 0.63157895 0.66666667 0.8 ] mean value: 0.6642056120693891 key: train_fscore value: [0.79041916 0.79518072 0.79761905 0.80487805 0.78527607 0.8 0.81871345 0.7804878 0.78787879 0.7826087 ] mean value: 0.7943061793288788 key: test_precision value: [0.77777778 0.75 0.6 0.6 0.6 0.8 0.5 0.66666667 0.66666667 0.72727273] mean value: 0.6688383838383838 key: train_precision value: [0.77647059 0.78571429 0.77906977 0.80487805 0.79012346 0.77272727 0.78651685 0.77108434 0.78313253 0.79746835] mean value: 0.7847185495522168 key: test_recall value: [0.77777778 0.66666667 0.66666667 0.33333333 0.66666667 0.88888889 0.55555556 0.6 0.66666667 0.88888889] mean value: 0.6711111111111111 key: train_recall value: [0.80487805 0.80487805 0.81707317 0.80487805 0.7804878 0.82926829 0.85365854 0.79012346 0.79268293 0.76829268] mean value: 0.8046221017765733 key: test_roc_auc value: [0.72222222 0.66666667 0.5 0.5 0.5 0.77777778 0.36111111 0.5 0.53333333 0.64444444] mean value: 0.5705555555555556 key: train_roc_auc value: [0.71616451 0.72596844 0.72226208 0.74557628 0.72357724 0.71855571 0.74055476 0.71236942 0.72326454 0.73030019] mean value: 0.7258593163483168 key: test_jcc value: [0.63636364 0.54545455 0.46153846 0.27272727 0.46153846 0.72727273 0.35714286 0.46153846 0.5 0.66666667] mean value: 0.5090243090243091 key: train_jcc value: [0.65346535 0.66 0.66336634 0.67346939 0.64646465 0.66666667 0.69306931 0.64 0.65 0.64285714] mean value: 0.6589358833842568 MCC on Blind test: 0.31 Accuracy on Blind test: 0.68 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00855732 0.02237892 0.00854349 0.00832772 0.00829554 0.00820947 0.0084157 0.00815558 0.0094521 0.00837159] mean value: 0.00987074375152588 key: score_time value: [0.04929829 0.02021766 0.01471043 0.00962043 0.0139358 0.0139904 0.00954151 0.01637411 0.01522803 0.01381803] mean value: 0.017673468589782713 key: test_mcc value: [-0.06804138 -0.28867513 -0.66666667 -0.06804138 0. 0.43082022 -0.21821789 -0.5 0.3721042 0.3721042 ] mean value: -0.06346138290225109 key: train_mcc value: [0.17136979 0.36243575 0.31069717 0.35315618 0.28850942 0.2741202 0.28523142 0.38382318 0.30552803 0.2083236 ] mean value: 0.29431947475155223 key: test_accuracy value: [0.53333333 0.4 0.2 0.53333333 0.53333333 0.73333333 0.4 0.33333333 0.71428571 0.71428571] mean value: 0.5095238095238095 key: train_accuracy value: [0.62406015 0.70676692 0.68421053 0.70676692 0.67669173 0.67669173 0.67669173 0.71428571 0.67910448 0.64179104] mean value: 0.6787060935921894 key: test_fscore value: [0.66666667 0.52631579 0.33333333 0.66666667 0.63157895 0.8 0.47058824 0.5 0.81818182 0.81818182] mean value: 0.6231513275166526 key: train_fscore value: [0.71590909 0.77456647 0.75862069 0.78453039 0.75706215 0.77248677 0.75977654 0.77906977 0.75144509 0.73333333] mean value: 0.7586800284465707 key: test_precision value: [0.58333333 0.5 0.33333333 0.58333333 0.6 0.72727273 0.5 0.5 0.69230769 0.69230769] mean value: 0.5711888111888112 key: train_precision value: [0.67021277 0.73626374 0.7173913 0.71717172 0.70526316 0.68224299 0.70103093 0.73626374 0.71428571 0.67346939] mean value: 0.7053595438429273 key: test_recall value: [0.77777778 0.55555556 0.33333333 0.77777778 0.66666667 0.88888889 0.44444444 0.5 1. 1. ] mean value: 0.6944444444444444 key: train_recall value: [0.76829268 0.81707317 0.80487805 0.86585366 0.81707317 0.8902439 0.82926829 0.82716049 0.79268293 0.80487805] mean value: 0.8217404396266185 key: test_roc_auc value: [0.47222222 0.36111111 0.16666667 0.47222222 0.5 0.69444444 0.38888889 0.25 0.6 0.6 ] mean value: 0.45055555555555554 key: train_roc_auc value: [0.58022477 0.67324247 0.64753706 0.65841703 0.63402678 0.61178862 0.63032042 0.68281102 0.64634146 0.59474672] mean value: 0.6359456345946064 key: test_jcc value: [0.5 0.35714286 0.2 0.5 0.46153846 0.66666667 0.30769231 0.33333333 0.69230769 0.69230769] mean value: 0.47109890109890107 key: train_jcc value: [0.55752212 0.63207547 0.61111111 0.64545455 0.60909091 0.62931034 0.61261261 0.63809524 0.60185185 0.57894737] mean value: 0.6116071577056825 MCC on Blind test: -0.02 Accuracy on Blind test: 0.54 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01080036 0.01022005 0.01002049 0.01102018 0.01053524 0.00990343 0.01091599 0.00984478 0.01048589 0.01073265] mean value: 0.01044790744781494 key: score_time value: [0.00896406 0.00891638 0.00948572 0.00896621 0.00873971 0.00927854 0.00958657 0.00880337 0.00961518 0.00933743] mean value: 0.009169316291809082 key: test_mcc value: [ 0.48038446 -0.21821789 0.27216553 -0.18463724 -0.06804138 0.61237244 -0.32025631 0.13867505 0.3721042 0. ] mean value: 0.10845488608517544 key: train_mcc value: [0.61007042 0.6473291 0.58606018 0.63995699 0.57981496 0.51766191 0.63995699 0.5556364 0.58656282 0.57162035] mean value: 0.5934670121246945 key: test_accuracy value: [0.73333333 0.53333333 0.66666667 0.46666667 0.53333333 0.8 0.46666667 0.66666667 0.71428571 0.64285714] mean value: 0.6223809523809524 key: train_accuracy value: [0.80451128 0.82706767 0.79699248 0.81954887 0.78947368 0.7593985 0.81954887 0.77443609 0.79104478 0.78358209] mean value: 0.7965604309280664 key: test_fscore value: [0.81818182 0.69565217 0.76190476 0.6 0.66666667 0.85714286 0.63636364 0.7826087 0.81818182 0.7826087 ] mean value: 0.7419311123658949 key: train_fscore value: [0.86315789 0.87567568 0.85714286 0.87234043 0.85416667 0.83673469 0.87234043 0.84375 0.85416667 0.84974093] mean value: 0.8579216238472576 key: test_precision value: [0.69230769 0.57142857 0.66666667 0.54545455 0.58333333 0.75 0.53846154 0.69230769 0.69230769 0.64285714] mean value: 0.6375124875124875 key: train_precision value: [0.75925926 0.78640777 0.75700935 0.77358491 0.74545455 0.71929825 0.77358491 0.72972973 0.74545455 0.73873874] mean value: 0.7528521988356293 key: test_recall value: [1. 0.88888889 0.88888889 0.66666667 0.77777778 1. 0.77777778 0.9 1. 1. ] mean value: 0.89 key: train_recall value: [1. 0.98780488 0.98780488 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9975609756097561 key: test_roc_auc value: [0.66666667 0.44444444 0.61111111 0.41666667 0.47222222 0.75 0.38888889 0.55 0.6 0.5 ] mean value: 0.54 key: train_roc_auc value: [0.74509804 0.77821616 0.73900048 0.76470588 0.7254902 0.68627451 0.76470588 0.71153846 0.73076923 0.72115385] mean value: 0.7366952691020123 key: test_jcc value: [0.69230769 0.53333333 0.61538462 0.42857143 0.5 0.75 0.46666667 0.64285714 0.69230769 0.64285714] mean value: 0.5964285714285714 key: train_jcc value: [0.75925926 0.77884615 0.75 0.77358491 0.74545455 0.71929825 0.77358491 0.72972973 0.74545455 0.73873874] mean value: 0.7513951029417762 MCC on Blind test: 0.34 Accuracy on Blind test: 0.7 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.67545986 0.81942654 0.68312645 0.5726397 0.66584706 0.72752714 0.56442165 1.12021947 0.92654943 0.62951541] mean value: 0.7384732723236084 key: score_time value: [0.01198483 0.01239991 0.01204967 0.01192904 0.01209116 0.01514149 0.01191807 0.01218581 0.01198363 0.01201415] mean value: 0.012369775772094726 key: test_mcc value: [ 0.44444444 -0.32025631 0. 0.38888889 0.57735027 0.32732684 -0.06804138 0.09449112 0.55943093 0.54772256] mean value: 0.2551357352065785 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73333333 0.46666667 0.53333333 0.66666667 0.8 0.66666667 0.53333333 0.53333333 0.78571429 0.78571429] mean value: 0.6504761904761904 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.77777778 0.63636364 0.63157895 0.66666667 0.84210526 0.70588235 0.66666667 0.58823529 0.82352941 0.85714286] mean value: 0.719594887396745 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.77777778 0.53846154 0.6 0.83333333 0.8 0.75 0.58333333 0.71428571 0.875 0.75 ] mean value: 0.7222191697191698 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.77777778 0.77777778 0.66666667 0.55555556 0.88888889 0.66666667 0.77777778 0.5 0.77777778 1. ] mean value: 0.7388888888888889 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.72222222 0.38888889 0.5 0.69444444 0.77777778 0.66666667 0.47222222 0.55 0.78888889 0.7 ] mean value: 0.6261111111111112 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.63636364 0.46666667 0.46153846 0.5 0.72727273 0.54545455 0.5 0.41666667 0.7 0.75 ] mean value: 0.5703962703962704 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.34 Accuracy on Blind test: 0.7 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01971579 0.01306844 0.01349854 0.0138483 0.01513457 0.01593947 0.01379418 0.01406312 0.01403928 0.01425767] mean value: 0.014735937118530273 key: score_time value: [0.01813078 0.00960183 0.00989127 0.01094699 0.01026487 0.00997877 0.01040626 0.01031375 0.01008248 0.01022696] mean value: 0.010984396934509278 key: test_mcc value: [0.73854895 0.8660254 0.87287156 0.8660254 0.72222222 0.57735027 0.6000992 0.85280287 0.86066297 0.84852814] mean value: 0.7805136972619839 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86666667 0.93333333 0.93333333 0.93333333 0.86666667 0.8 0.8 0.93333333 0.92857143 0.92857143] mean value: 0.8923809523809524 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.9 0.94736842 0.94117647 0.94736842 0.88888889 0.84210526 0.82352941 0.95238095 0.94117647 0.94736842] mean value: 0.9131362720526808 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.81818182 0.9 1. 0.9 0.88888889 0.8 0.875 0.90909091 1. 0.9 ] mean value: 0.8991161616161616 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 0.88888889 1. 0.88888889 0.88888889 0.77777778 1. 0.88888889 1. ] mean value: 0.9333333333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.83333333 0.91666667 0.94444444 0.91666667 0.86111111 0.77777778 0.80555556 0.9 0.94444444 0.9 ] mean value: 0.88 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.81818182 0.9 0.88888889 0.9 0.8 0.72727273 0.7 0.90909091 0.88888889 0.9 ] mean value: 0.8432323232323232 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.11488128 0.09547067 0.09398174 0.09794021 0.09405684 0.09143519 0.08864164 0.08879304 0.089674 0.09006691] mean value: 0.0944941520690918 key: score_time value: [0.01881909 0.01698303 0.01879811 0.01835728 0.01962328 0.01830673 0.01720667 0.01846004 0.0170567 0.01702929] mean value: 0.018064022064208984 key: test_mcc value: [0.73854895 0.12309149 0.12309149 0.49099025 0.16666667 0.57735027 0.28867513 0.21320072 0.68888889 0.3721042 ] mean value: 0.3782608060328875 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86666667 0.6 0.6 0.73333333 0.6 0.8 0.66666667 0.66666667 0.85714286 0.71428571] mean value: 0.7104761904761905 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.9 0.7 0.7 0.75 0.66666667 0.84210526 0.73684211 0.76190476 0.88888889 0.81818182] mean value: 0.7764589504063188 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.81818182 0.63636364 0.63636364 0.85714286 0.66666667 0.8 0.7 0.72727273 0.88888889 0.69230769] mean value: 0.7423187923187923 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.77777778 0.77777778 0.66666667 0.66666667 0.88888889 0.77777778 0.8 0.88888889 1. ] mean value: 0.8244444444444444 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.83333333 0.55555556 0.55555556 0.75 0.58333333 0.77777778 0.63888889 0.6 0.84444444 0.6 ] mean value: 0.6738888888888889 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.81818182 0.53846154 0.53846154 0.6 0.5 0.72727273 0.58333333 0.61538462 0.8 0.69230769] mean value: 0.6413403263403263 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.35 Accuracy on Blind test: 0.7 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00981903 0.00927949 0.00883126 0.00921917 0.00866628 0.00878716 0.00877333 0.00881672 0.00892115 0.00923777] mean value: 0.009035134315490722 key: score_time value: [0.0092175 0.00844312 0.00844026 0.00871563 0.00834346 0.00851798 0.0086832 0.00851035 0.00845814 0.00913239] mean value: 0.008646202087402344 key: test_mcc value: [ 0.57735027 0.16666667 -0.21821789 0.16666667 0.16666667 0. 0. 0.1 0.25819889 0.3721042 ] mean value: 0.15894354724684198 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.8 0.6 0.4 0.6 0.6 0.53333333 0.53333333 0.6 0.64285714 0.71428571] mean value: 0.6023809523809524 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.84210526 0.66666667 0.47058824 0.66666667 0.66666667 0.63157895 0.63157895 0.7 0.70588235 0.81818182] mean value: 0.6799915564311849 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8 0.66666667 0.5 0.66666667 0.66666667 0.6 0.6 0.7 0.75 0.69230769] mean value: 0.6642307692307692 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88888889 0.66666667 0.44444444 0.66666667 0.66666667 0.66666667 0.66666667 0.7 0.66666667 1. ] mean value: 0.7033333333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.77777778 0.58333333 0.38888889 0.58333333 0.58333333 0.5 0.5 0.55 0.63333333 0.6 ] mean value: 0.57 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.72727273 0.5 0.30769231 0.5 0.5 0.46153846 0.46153846 0.53846154 0.54545455 0.69230769] mean value: 0.5234265734265734 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.48 Accuracy on Blind test: 0.76 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.14825535 1.1420064 1.24736357 1.28721571 1.46088004 1.41407871 1.45253873 1.21245909 1.15079379 1.13232279] mean value: 1.2647914171218873 key: score_time value: [0.08690786 0.08760619 0.0870502 0.1696291 0.11017585 0.11227798 0.11885667 0.09209251 0.09239531 0.08728147] mean value: 0.10442731380462647 key: test_mcc value: [0.73854895 0.44444444 0.12309149 0.49099025 0.57735027 0.57735027 0.44444444 0.53300179 0.86066297 0.54772256] mean value: 0.5337607431372515 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86666667 0.73333333 0.6 0.73333333 0.8 0.8 0.73333333 0.8 0.92857143 0.78571429] mean value: 0.7780952380952381 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.9 0.77777778 0.7 0.75 0.84210526 0.84210526 0.77777778 0.85714286 0.94117647 0.85714286] mean value: 0.8245228266745295 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.81818182 0.77777778 0.63636364 0.85714286 0.8 0.8 0.77777778 0.81818182 1. 0.75 ] mean value: 0.8035425685425686 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.77777778 0.77777778 0.66666667 0.88888889 0.88888889 0.77777778 0.9 0.88888889 1. ] mean value: 0.8566666666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.83333333 0.72222222 0.55555556 0.75 0.77777778 0.77777778 0.72222222 0.75 0.94444444 0.7 ] mean value: 0.7533333333333334 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.81818182 0.63636364 0.53846154 0.6 0.72727273 0.72727273 0.63636364 0.75 0.88888889 0.75 ] mean value: 0.7072804972804972 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.53 Accuracy on Blind test: 0.78 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: fit_time value: [1.70673203 0.90127635 0.91046214 0.90426564 0.95201159 0.90982032 0.84366179 0.90904737 0.85923839 0.94528151] mean value: 0.9841797113418579 key: score_time value: [0.19794631 0.20943165 0.21002579 0.20278788 0.13683844 0.20249867 0.14593673 0.13761425 0.12478852 0.16785192] mean value: 0.1735720157623291 key: test_mcc value: [0.73854895 0.43082022 0.27216553 0.49099025 0.43082022 0.57735027 0.43082022 0.7 0.84852814 0.3721042 ] mean value: 0.5292147991546989 key: train_mcc value: [0.85869998 0.87269455 0.88951136 0.88951136 0.87406606 0.88951136 0.92202167 0.89249493 0.87565664 0.90622006] mean value: 0.8870387975267908 key: test_accuracy value: [0.86666667 0.73333333 0.66666667 0.73333333 0.73333333 0.8 0.73333333 0.86666667 0.92857143 0.71428571] mean value: 0.7776190476190477 key: train_accuracy value: [0.93233083 0.93984962 0.94736842 0.94736842 0.93984962 0.94736842 0.96240602 0.94736842 0.94029851 0.95522388] mean value: 0.9459432162495791 key: test_fscore value: [0.9 0.8 0.76190476 0.75 0.8 0.84210526 0.8 0.9 0.94736842 0.81818182] mean value: 0.8319560264297107 key: train_fscore value: [0.94736842 0.95238095 0.95857988 0.95857988 0.95294118 0.95857988 0.9704142 0.95857988 0.95294118 0.96428571] mean value: 0.9574651168471126 key: test_precision value: [0.81818182 0.72727273 0.66666667 0.85714286 0.72727273 0.8 0.72727273 0.9 0.9 0.69230769] mean value: 0.7816117216117217 key: train_precision value: [0.91011236 0.93023256 0.93103448 0.93103448 0.92045455 0.93103448 0.94252874 0.92045455 0.92045455 0.94186047] mean value: 0.9279201203078058 key: test_recall value: [1. 0.88888889 0.88888889 0.66666667 0.88888889 0.88888889 0.88888889 0.9 1. 1. ] mean value: 0.9011111111111111 key: train_recall value: [0.98780488 0.97560976 0.98780488 0.98780488 0.98780488 0.98780488 1. 1. 0.98780488 0.98780488] mean value: 0.9890243902439024 key: test_roc_auc value: [0.83333333 0.69444444 0.61111111 0.75 0.69444444 0.77777778 0.69444444 0.85 0.9 0.6 ] mean value: 0.7405555555555555 key: train_roc_auc value: [0.91547107 0.92898135 0.93507891 0.93507891 0.92527499 0.93507891 0.95098039 0.93269231 0.92659475 0.94582552] mean value: 0.9331057094507597 key: test_jcc value: [0.81818182 0.66666667 0.61538462 0.6 0.66666667 0.72727273 0.66666667 0.81818182 0.9 0.69230769] mean value: 0.7171328671328672 key: train_jcc value: [0.9 0.90909091 0.92045455 0.92045455 0.91011236 0.92045455 0.94252874 0.92045455 0.91011236 0.93103448] mean value: 0.9184697028401019 MCC on Blind test: 0.71 Accuracy on Blind test: 0.86 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01027036 0.01250505 0.01518893 0.01096773 0.00941038 0.01338744 0.00863194 0.01007581 0.01536703 0.01049685] mean value: 0.01163015365600586 key: score_time value: [0.01019716 0.0142858 0.01393628 0.01013446 0.00880861 0.00876451 0.00855732 0.01425838 0.01521611 0.01059222] mean value: 0.011475086212158203 key: test_mcc value: [ 0.44444444 0.32732684 0. 0. 0. 0.57735027 -0.28867513 0. 0.06666667 0.33734954] mean value: 0.14644626235299057 key: train_mcc value: [0.43769978 0.45554586 0.45215696 0.49115256 0.4455592 0.44919673 0.49718111 0.42789983 0.44816116 0.45628689] mean value: 0.4560840076371738 key: test_accuracy value: [0.73333333 0.66666667 0.53333333 0.46666667 0.53333333 0.8 0.4 0.53333333 0.57142857 0.71428571] mean value: 0.5952380952380952 key: train_accuracy value: [0.73684211 0.7443609 0.7443609 0.7593985 0.73684211 0.7443609 0.76691729 0.72932331 0.73880597 0.73880597] mean value: 0.7440017955336101 key: test_fscore value: [0.77777778 0.70588235 0.63157895 0.42857143 0.63157895 0.84210526 0.52631579 0.63157895 0.66666667 0.8 ] mean value: 0.6642056120693891 key: train_fscore value: [0.79041916 0.79518072 0.79761905 0.80487805 0.78527607 0.8 0.81871345 0.7804878 0.78787879 0.7826087 ] mean value: 0.7943061793288788 key: test_precision value: [0.77777778 0.75 0.6 0.6 0.6 0.8 0.5 0.66666667 0.66666667 0.72727273] mean value: 0.6688383838383838 key: train_precision value: [0.77647059 0.78571429 0.77906977 0.80487805 0.79012346 0.77272727 0.78651685 0.77108434 0.78313253 0.79746835] mean value: 0.7847185495522168 key: test_recall value: [0.77777778 0.66666667 0.66666667 0.33333333 0.66666667 0.88888889 0.55555556 0.6 0.66666667 0.88888889] mean value: 0.6711111111111111 key: train_recall value: [0.80487805 0.80487805 0.81707317 0.80487805 0.7804878 0.82926829 0.85365854 0.79012346 0.79268293 0.76829268] mean value: 0.8046221017765733 key: test_roc_auc value: [0.72222222 0.66666667 0.5 0.5 0.5 0.77777778 0.36111111 0.5 0.53333333 0.64444444] mean value: 0.5705555555555556 key: train_roc_auc value: [0.71616451 0.72596844 0.72226208 0.74557628 0.72357724 0.71855571 0.74055476 0.71236942 0.72326454 0.73030019] mean value: 0.7258593163483168 key: test_jcc value: [0.63636364 0.54545455 0.46153846 0.27272727 0.46153846 0.72727273 0.35714286 0.46153846 0.5 0.66666667] mean value: 0.5090243090243091 key: train_jcc value: [0.65346535 0.66 0.66336634 0.67346939 0.64646465 0.66666667 0.69306931 0.64 0.65 0.64285714] mean value: 0.6589358833842568 MCC on Blind test: 0.31 Accuracy on Blind test: 0.68 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.16157722 0.43932223 0.04518628 0.13460135 0.04375291 0.04577804 0.04375458 0.04748416 0.05378747 0.05692935] mean value: 0.10721735954284668 key: score_time value: [0.01405382 0.01124883 0.01167321 0.01080108 0.01058769 0.01020837 0.01086092 0.01022172 0.01022148 0.01110244] mean value: 0.011097955703735351 key: test_mcc value: [0.73854895 0.57735027 0.87287156 0.8660254 0.8660254 0.57735027 0.72222222 1. 1. 0.51854497] mean value: 0.7738939047860451 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86666667 0.8 0.93333333 0.93333333 0.93333333 0.8 0.86666667 1. 1. 0.78571429] mean value: 0.891904761904762 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.9 0.84210526 0.94117647 0.94736842 0.94736842 0.84210526 0.88888889 1. 1. 0.84210526] mean value: 0.9151117991056071 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.81818182 0.8 1. 0.9 0.9 0.8 0.88888889 1. 1. 0.8 ] mean value: 0.8907070707070708 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.88888889 0.88888889 1. 1. 0.88888889 0.88888889 1. 1. 0.88888889] mean value: 0.9444444444444444 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.83333333 0.77777778 0.94444444 0.91666667 0.91666667 0.77777778 0.86111111 1. 1. 0.74444444] mean value: 0.8772222222222222 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.81818182 0.72727273 0.88888889 0.9 0.9 0.72727273 0.8 1. 1. 0.72727273] mean value: 0.8488888888888889 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.83 Accuracy on Blind test: 0.92 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.04252434 0.05116844 0.05966687 0.05816221 0.04887724 0.05299878 0.0588336 0.0550344 0.06454182 0.05411196] mean value: 0.054591965675354 key: score_time value: [0.01209831 0.0214119 0.02118158 0.02093101 0.02505398 0.02150726 0.02001572 0.02202773 0.02196813 0.02339149] mean value: 0.020958709716796874 key: test_mcc value: [ 0.05455447 0.16666667 0.32732684 0.66666667 0.6000992 0.72222222 0.32732684 -0.53300179 0.74535599 0.84852814] mean value: 0.39257452360062706 key: train_mcc value: [0.98416472 1. 1. 1. 0.98416472 1. 1. 0.98428077 0.98435397 1. ] mean value: 0.9936964183102901 key: test_accuracy value: [0.53333333 0.6 0.66666667 0.8 0.8 0.86666667 0.66666667 0.2 0.85714286 0.92857143] mean value: 0.6919047619047619 key: train_accuracy value: [0.9924812 1. 1. 1. 0.9924812 1. 1. 0.9924812 0.99253731 1. ] mean value: 0.9969980922455393 key: test_fscore value: [0.58823529 0.66666667 0.70588235 0.8 0.82352941 0.88888889 0.70588235 0.14285714 0.875 0.94736842] mean value: 0.7144310531230036 key: train_fscore value: [0.99393939 1. 1. 1. 0.99393939 1. 1. 0.99386503 0.99393939 1. ] mean value: 0.9975683212493028 key: test_precision value: [0.625 0.66666667 0.75 1. 0.875 0.88888889 0.75 0.25 1. 0.9 ] mean value: 0.7705555555555555 key: train_precision value: [0.98795181 1. 1. 1. 0.98795181 1. 1. 0.98780488 0.98795181 1. ] mean value: 0.9951660299735527 key: test_recall value: [0.55555556 0.66666667 0.66666667 0.66666667 0.77777778 0.88888889 0.66666667 0.1 0.77777778 1. ] mean value: 0.6766666666666666 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.52777778 0.58333333 0.66666667 0.83333333 0.80555556 0.86111111 0.66666667 0.25 0.88888889 0.9 ] mean value: 0.6983333333333334 key: train_roc_auc value: [0.99019608 1. 1. 1. 0.99019608 1. 1. 0.99038462 0.99038462 1. ] mean value: 0.9961161387631976 key: test_jcc value: [0.41666667 0.5 0.54545455 0.66666667 0.7 0.8 0.54545455 0.07692308 0.77777778 0.9 ] mean value: 0.5928943278943279 key: train_jcc value: [0.98795181 1. 1. 1. 0.98795181 1. 1. 0.98780488 0.98795181 1. ] mean value: 0.9951660299735527 MCC on Blind test: 0.27 Accuracy on Blind test: 0.62 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01957345 0.00936222 0.00870514 0.00975537 0.00946307 0.00951624 0.00876665 0.01013446 0.00984979 0.01234937] mean value: 0.010747575759887695 key: score_time value: [0.00978994 0.00869393 0.00832486 0.00907922 0.00906873 0.00860548 0.00923014 0.01038742 0.00961089 0.01400924] mean value: 0.00967998504638672 key: test_mcc value: [ 0.28867513 -0.06804138 0.48038446 0. 0.28867513 0.61237244 0.27216553 0.18898224 0.33734954 0.3721042 ] mean value: 0.27726672942748454 key: train_mcc value: [0.42695156 0.40914219 0.41362409 0.40914219 0.44310968 0.46233819 0.40761269 0.51648972 0.41757429 0.36234681] mean value: 0.4268331407675872 key: test_accuracy value: [0.66666667 0.53333333 0.73333333 0.53333333 0.6 0.8 0.66666667 0.6 0.71428571 0.71428571] mean value: 0.6561904761904762 key: train_accuracy value: [0.73684211 0.72932331 0.72932331 0.72932331 0.7443609 0.7518797 0.72932331 0.77443609 0.73134328 0.70895522] mean value: 0.7365110537537874 key: test_fscore value: [0.73684211 0.66666667 0.81818182 0.63157895 0.57142857 0.85714286 0.76190476 0.66666667 0.8 0.81818182] mean value: 0.7328594212804739 key: train_fscore value: [0.8 0.79545455 0.79069767 0.79545455 0.80681818 0.82162162 0.79775281 0.82954545 0.79545455 0.78688525] mean value: 0.8019684623657902 key: test_precision value: [0.7 0.58333333 0.69230769 0.6 0.8 0.75 0.66666667 0.75 0.72727273 0.69230769] mean value: 0.6961888111888112 key: train_precision value: [0.75268817 0.74468085 0.75555556 0.74468085 0.75531915 0.73786408 0.73958333 0.76842105 0.74468085 0.71287129] mean value: 0.7456345180489754 key: test_recall value: [0.77777778 0.77777778 1. 0.66666667 0.44444444 1. 0.88888889 0.6 0.88888889 1. ] mean value: 0.8044444444444444 key: train_recall value: [0.85365854 0.85365854 0.82926829 0.85365854 0.86585366 0.92682927 0.86585366 0.90123457 0.85365854 0.87804878] mean value: 0.8681722372779284 key: test_roc_auc value: [0.63888889 0.47222222 0.66666667 0.5 0.63888889 0.75 0.61111111 0.6 0.64444444 0.6 ] mean value: 0.6122222222222222 key: train_roc_auc value: [0.70133907 0.69153515 0.69894787 0.69153515 0.70743663 0.69870875 0.68782879 0.73907882 0.69606004 0.66017824] mean value: 0.6972648516706383 key: test_jcc value: [0.58333333 0.5 0.69230769 0.46153846 0.4 0.75 0.61538462 0.5 0.66666667 0.69230769] mean value: 0.5861538461538461 key: train_jcc value: [0.66666667 0.66037736 0.65384615 0.66037736 0.67619048 0.69724771 0.6635514 0.70873786 0.66037736 0.64864865] mean value: 0.6696020993192491 MCC on Blind test: 0.22 Accuracy on Blind test: 0.65 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01204324 0.01354265 0.01453185 0.01378798 0.01325369 0.01510215 0.01632333 0.01447392 0.01603985 0.01390338] mean value: 0.014300203323364258 key: score_time value: [0.00963211 0.01172948 0.01188231 0.01177025 0.01262784 0.01211071 0.01205111 0.01195526 0.01205778 0.02415824] mean value: 0.01299750804901123 key: test_mcc value: [ 0.49099025 -0.21821789 0.28867513 0.38888889 0. 0.38888889 0.44444444 0.35355339 0.64549722 0.54772256] mean value: 0.33304428920783685 key: train_mcc value: [0.7481685 0.53345478 0.92239408 0.68894951 0.41644772 0.83129833 0.96819703 0.57295971 0.83282505 0.76522585] mean value: 0.7279920567315256 key: test_accuracy value: [0.73333333 0.53333333 0.66666667 0.66666667 0.4 0.66666667 0.73333333 0.53333333 0.78571429 0.78571429] mean value: 0.6504761904761904 key: train_accuracy value: [0.85714286 0.76691729 0.96240602 0.84210526 0.60150376 0.90977444 0.98496241 0.72932331 0.91044776 0.8880597 ] mean value: 0.8452642801032432 key: test_fscore value: [0.75 0.69565217 0.73684211 0.66666667 0. 0.66666667 0.77777778 0.46153846 0.8 0.85714286] mean value: 0.6412286708968631 key: train_fscore value: [0.86896552 0.84102564 0.9689441 0.8627451 0.52252252 0.92105263 0.98780488 0.71428571 0.92105263 0.90797546] mean value: 0.851637419382273 key: test_precision value: [0.85714286 0.57142857 0.7 0.83333333 0. 0.83333333 0.77777778 1. 1. 0.75 ] mean value: 0.7323015873015873 key: train_precision value: [1. 0.72566372 0.98734177 0.92957746 1. 1. 0.98780488 1. 1. 0.91358025] mean value: 0.9543968078717151 key: test_recall value: [0.66666667 0.88888889 0.77777778 0.55555556 0. 0.55555556 0.77777778 0.3 0.66666667 1. ] mean value: 0.6188888888888889 key: train_recall value: [0.76829268 1. 0.95121951 0.80487805 0.35365854 0.85365854 0.98780488 0.55555556 0.85365854 0.90243902] mean value: 0.8031165311653117 key: test_roc_auc value: [0.75 0.44444444 0.63888889 0.69444444 0.5 0.69444444 0.72222222 0.65 0.83333333 0.7 ] mean value: 0.6627777777777778 key: train_roc_auc value: [0.88414634 0.69607843 0.96580583 0.85341942 0.67682927 0.92682927 0.98409852 0.77777778 0.92682927 0.88391182] mean value: 0.8575725943911022 key: test_jcc value: [0.6 0.53333333 0.58333333 0.5 0. 0.5 0.63636364 0.3 0.66666667 0.75 ] mean value: 0.506969696969697 key: train_jcc value: [0.76829268 0.72566372 0.93975904 0.75862069 0.35365854 0.85365854 0.97590361 0.55555556 0.85365854 0.83146067] mean value: 0.7616231579467527 MCC on Blind test: 0.49 Accuracy on Blind test: 0.76 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01271224 0.0140152 0.01353335 0.01386046 0.0133822 0.01415133 0.01415229 0.01478529 0.01401305 0.01370788] mean value: 0.013831329345703126 key: score_time value: [0.00936437 0.01232767 0.01146793 0.01134229 0.01132178 0.0113194 0.01169276 0.01195049 0.01243901 0.01177311] mean value: 0.011499881744384766 key: test_mcc value: [ 0.32732684 0.27216553 0.16666667 -0.08006408 0.61237244 0.72222222 0.28867513 0.5547002 1. 0.54772256] mean value: 0.4411787498337245 key: train_mcc value: [0.84393984 0.96845676 0.83443276 0.42561819 0.73349852 0.81675202 0.98428077 0.73141304 0.92326075 0.82277852] mean value: 0.808443116758431 key: test_accuracy value: [0.66666667 0.66666667 0.6 0.4 0.8 0.86666667 0.66666667 0.8 1. 0.78571429] mean value: 0.7252380952380952 key: train_accuracy value: [0.91729323 0.98496241 0.91729323 0.60902256 0.87218045 0.90977444 0.9924812 0.86466165 0.96268657 0.91044776] mean value: 0.894080350129054 key: test_fscore value: [0.70588235 0.76190476 0.66666667 0.18181818 0.85714286 0.88888889 0.73684211 0.86956522 1. 0.85714286] mean value: 0.7525853889159853 key: train_fscore value: [0.92810458 0.98795181 0.92993631 0.53571429 0.9039548 0.93181818 0.99386503 0.9 0.9689441 0.92307692] mean value: 0.9003366011047804 key: test_precision value: [0.75 0.66666667 0.66666667 0.5 0.75 0.88888889 0.7 0.76923077 1. 0.75 ] mean value: 0.7441452991452991 key: train_precision value: [1. 0.97619048 0.97333333 1. 0.84210526 0.87234043 1. 0.81818182 0.98734177 0.97297297] mean value: 0.9442466061520309 key: test_recall value: [0.66666667 0.88888889 0.66666667 0.11111111 1. 0.88888889 0.77777778 1. 1. 1. ] mean value: 0.7999999999999999 key: train_recall value: [0.86585366 1. 0.8902439 0.36585366 0.97560976 1. 0.98780488 1. 0.95121951 0.87804878] mean value: 0.8914634146341464 key: test_roc_auc value: [0.66666667 0.61111111 0.58333333 0.47222222 0.75 0.86111111 0.63888889 0.7 1. 0.7 ] mean value: 0.6983333333333334 key: train_roc_auc value: [0.93292683 0.98039216 0.92551411 0.68292683 0.84074605 0.88235294 0.99390244 0.82692308 0.96599437 0.91979362] mean value: 0.8951472427620204 key: test_jcc value: [0.54545455 0.61538462 0.5 0.1 0.75 0.8 0.58333333 0.76923077 1. 0.75 ] mean value: 0.6413403263403263 key: train_jcc value: [0.86585366 0.97619048 0.86904762 0.36585366 0.82474227 0.87234043 0.98780488 0.81818182 0.93975904 0.85714286] mean value: 0.8376916695402452 MCC on Blind test: 0.46 Accuracy on Blind test: 0.7 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.10654593 0.12945461 0.13125443 0.12465715 0.10133481 0.15334868 0.11432934 0.11598229 0.12642503 0.13426089] mean value: 0.12375931739807129 key: score_time value: [0.01605654 0.01604772 0.01675987 0.01631927 0.02109814 0.02507401 0.01946521 0.0151999 0.01829481 0.01699734] mean value: 0.018131279945373537 key: test_mcc value: [0.73854895 0.57735027 1. 0.8660254 0.6000992 0.57735027 0.72222222 0.70710678 1. 0.84852814] mean value: 0.7637231227021293 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86666667 0.8 1. 0.93333333 0.8 0.8 0.86666667 0.86666667 1. 0.92857143] mean value: 0.8861904761904762 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.9 0.84210526 1. 0.94736842 0.82352941 0.84210526 0.88888889 0.90909091 1. 0.94736842] mean value: 0.9100456578165557 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.81818182 0.8 1. 0.9 0.875 0.8 0.88888889 0.83333333 1. 0.9 ] mean value: 0.881540404040404 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.88888889 1. 1. 0.77777778 0.88888889 0.88888889 1. 1. 1. ] mean value: 0.9444444444444444 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.83333333 0.77777778 1. 0.91666667 0.80555556 0.77777778 0.86111111 0.8 1. 0.9 ] mean value: 0.8672222222222222 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.81818182 0.72727273 1. 0.9 0.7 0.72727273 0.8 0.83333333 1. 0.9 ] mean value: 0.8406060606060606 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.0503726 0.02812481 0.04068756 0.03008389 0.02571535 0.06941533 0.03493547 0.04549479 0.02969337 0.03361511] mean value: 0.03881382942199707 key: score_time value: [0.01990795 0.01710773 0.02853751 0.02316236 0.02038765 0.018929 0.02127457 0.02027416 0.0221498 0.02187991] mean value: 0.021361064910888673 key: test_mcc value: [0.73854895 0.8660254 0.87287156 0.72222222 0.8660254 0.73854895 0.72222222 0.7 0.74535599 0.84852814] mean value: 0.7820348834633071 key: train_mcc value: [1. 0.98428077 0.96891398 1. 0.98416472 1. 1. 0.96869441 1. 0.96857411] mean value: 0.9874627986897856 key: test_accuracy value: [0.86666667 0.93333333 0.93333333 0.86666667 0.93333333 0.86666667 0.86666667 0.86666667 0.85714286 0.92857143] mean value: 0.891904761904762 key: train_accuracy value: [1. 0.9924812 0.98496241 1. 0.9924812 1. 1. 0.98496241 1. 0.98507463] mean value: 0.9939961844910784 key: test_fscore value: [0.9 0.94736842 0.94117647 0.88888889 0.94736842 0.9 0.88888889 0.9 0.875 0.94736842] mean value: 0.9136059511523908 key: train_fscore value: [1. 0.99386503 0.98765432 1. 0.99393939 1. 1. 0.98780488 1. 0.98780488] mean value: 0.9951068501699456 key: test_precision value: [0.81818182 0.9 1. 0.88888889 0.9 0.81818182 0.88888889 0.9 1. 0.9 ] mean value: 0.9014141414141414 key: train_precision value: [1. 1. 1. 1. 0.98795181 1. 1. 0.97590361 1. 0.98780488] mean value: 0.9951660299735527 key: test_recall value: [1. 1. 0.88888889 0.88888889 1. 1. 0.88888889 0.9 0.77777778 1. ] mean value: 0.9344444444444444 key: train_recall value: [1. 0.98780488 0.97560976 1. 1. 1. 1. 1. 1. 0.98780488] mean value: 0.9951219512195122 key: test_roc_auc value: [0.83333333 0.91666667 0.94444444 0.86111111 0.91666667 0.83333333 0.86111111 0.85 0.88888889 0.9 ] mean value: 0.8805555555555555 key: train_roc_auc value: [1. 0.99390244 0.98780488 1. 0.99019608 1. 1. 0.98076923 1. 0.98428705] mean value: 0.993695968068278 key: test_jcc value: [0.81818182 0.9 0.88888889 0.8 0.9 0.81818182 0.8 0.81818182 0.77777778 0.9 ] mean value: 0.8421212121212122 key: train_jcc value: [1. 0.98780488 0.97560976 1. 0.98795181 1. 1. 0.97590361 1. 0.97590361] mean value: 0.990317367029092 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.11155105 0.13867617 0.09597683 0.06546187 0.05309463 0.06110954 0.06886148 0.10508013 0.0533936 0.06201959] mean value: 0.08152248859405517 key: score_time value: [0.02223778 0.02060914 0.0354867 0.03397918 0.03515315 0.0243063 0.02288532 0.02245426 0.02287269 0.0232141 ] mean value: 0.02631986141204834 key: test_mcc value: [ 0.43082022 -0.06804138 -0.21821789 -0.28867513 0. 0.72222222 -0.28867513 0.13867505 0.06666667 0.70064905] mean value: 0.1195423664948636 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73333333 0.53333333 0.4 0.4 0.53333333 0.86666667 0.4 0.66666667 0.57142857 0.85714286] mean value: 0.5961904761904762 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8 0.66666667 0.47058824 0.52631579 0.63157895 0.88888889 0.52631579 0.7826087 0.66666667 0.9 ] mean value: 0.6859629679484303 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.72727273 0.58333333 0.5 0.5 0.6 0.88888889 0.5 0.69230769 0.66666667 0.81818182] mean value: 0.6476651126651126 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88888889 0.77777778 0.44444444 0.55555556 0.66666667 0.88888889 0.55555556 0.9 0.66666667 1. ] mean value: 0.7344444444444445 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.69444444 0.47222222 0.38888889 0.36111111 0.5 0.86111111 0.36111111 0.55 0.53333333 0.8 ] mean value: 0.5522222222222223 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.66666667 0.5 0.30769231 0.35714286 0.46153846 0.8 0.35714286 0.64285714 0.5 0.81818182] mean value: 0.5411222111222111 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: -0.02 Accuracy on Blind test: 0.54 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.33926439 0.26802969 0.26510382 0.28069091 0.28672743 0.26764488 0.2655201 0.27123427 0.26848125 0.2639358 ] mean value: 0.277663254737854 key: score_time value: [0.01455426 0.00921488 0.00932932 0.01044846 0.00904274 0.00909829 0.00927019 0.00904536 0.00913763 0.00903177] mean value: 0.009817290306091308 key: test_mcc value: [0.73854895 0.8660254 1. 0.72222222 0.8660254 0.57735027 0.72222222 0.85280287 1. 0.84852814] mean value: 0.8193725469925243 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.86666667 0.93333333 1. 0.86666667 0.93333333 0.8 0.86666667 0.93333333 1. 0.92857143] mean value: 0.9128571428571429 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.9 0.94736842 1. 0.88888889 0.94736842 0.84210526 0.88888889 0.95238095 1. 0.94736842] mean value: 0.931436925647452 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.81818182 0.9 1. 0.88888889 0.9 0.8 0.88888889 0.90909091 1. 0.9 ] mean value: 0.9005050505050505 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 0.88888889 1. 0.88888889 0.88888889 1. 1. 1. ] mean value: 0.9666666666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.83333333 0.91666667 1. 0.86111111 0.91666667 0.77777778 0.86111111 0.9 1. 0.9 ] mean value: 0.8966666666666667 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.81818182 0.9 1. 0.8 0.9 0.72727273 0.8 0.90909091 1. 0.9 ] mean value: 0.8754545454545455 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.01630831 0.01763463 0.03990936 0.01766634 0.0183568 0.01870799 0.01855612 0.01905155 0.02915406 0.01773953] mean value: 0.021308469772338866 key: score_time value: [0.01214647 0.01199389 0.01224542 0.01331973 0.013767 0.01297832 0.012532 0.01315141 0.01218915 0.01300168] mean value: 0.012732505798339844 key: test_mcc value: [-0.06804138 0.48038446 -0.38888889 -0.18463724 0. -0.06804138 0.43082022 -0.35355339 0.33734954 0.33734954] mean value: 0.05227414853437966 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.53333333 0.73333333 0.33333333 0.46666667 0.53333333 0.53333333 0.73333333 0.46666667 0.71428571 0.71428571] mean value: 0.5761904761904761 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.81818182 0.44444444 0.6 0.63157895 0.66666667 0.8 0.63636364 0.8 0.8 ] mean value: 0.6863902179691653 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.58333333 0.69230769 0.44444444 0.54545455 0.6 0.58333333 0.72727273 0.58333333 0.72727273 0.72727273] mean value: 0.6214024864024864 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.77777778 1. 0.44444444 0.66666667 0.66666667 0.77777778 0.88888889 0.7 0.88888889 0.88888889] mean value: 0.77 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.47222222 0.66666667 0.30555556 0.41666667 0.5 0.47222222 0.69444444 0.35 0.64444444 0.64444444] mean value: 0.5166666666666666 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.69230769 0.28571429 0.42857143 0.46153846 0.5 0.66666667 0.46666667 0.66666667 0.66666667] mean value: 0.5334798534798535 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: -0.14 Accuracy on Blind test: 0.51 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.04897857 0.03619099 0.0377605 0.0363791 0.0273664 0.03334212 0.03329372 0.03301764 0.04070854 0.03767133] mean value: 0.036470890045166016 key: score_time value: [0.01737165 0.02049923 0.02360916 0.02222991 0.02205873 0.02062511 0.01997638 0.01992583 0.02362394 0.02236056] mean value: 0.02122805118560791 key: test_mcc value: [ 0.44444444 -0.06804138 0.16666667 0.49099025 0.73854895 0.6000992 0.44444444 0.53300179 1. 0.54772256] mean value: 0.4897876919261729 key: train_mcc value: [0.96845676 0.95286855 0.95286855 0.93739264 0.88847246 0.96845676 0.95223938 0.93688296 0.95281321 0.95281321] mean value: 0.9463264497946997 key: test_accuracy value: [0.73333333 0.53333333 0.6 0.73333333 0.86666667 0.8 0.73333333 0.8 1. 0.78571429] mean value: 0.7585714285714286 key: train_accuracy value: [0.98496241 0.97744361 0.97744361 0.96992481 0.94736842 0.98496241 0.97744361 0.96992481 0.97761194 0.97761194] mean value: 0.9744697564807541 key: test_fscore value: [0.77777778 0.66666667 0.66666667 0.75 0.9 0.82352941 0.77777778 0.85714286 1. 0.85714286] mean value: 0.8076704014939309 key: train_fscore value: [0.98795181 0.98203593 0.98203593 0.97619048 0.95808383 0.98795181 0.98181818 0.97560976 0.98181818 0.98181818] mean value: 0.9795314080823169 key: test_precision value: [0.77777778 0.58333333 0.66666667 0.85714286 0.81818182 0.875 0.77777778 0.81818182 1. 0.75 ] mean value: 0.792406204906205 key: train_precision value: [0.97619048 0.96470588 0.96470588 0.95348837 0.94117647 0.97619048 0.97590361 0.96385542 0.97590361 0.97590361] mean value: 0.9668023824828335 key: test_recall value: [0.77777778 0.77777778 0.66666667 0.66666667 1. 0.77777778 0.77777778 0.9 1. 1. ] mean value: 0.8344444444444444 key: train_recall value: [1. 1. 1. 1. 0.97560976 1. 0.98780488 0.98765432 0.98780488 0.98780488] mean value: 0.9926678711231557 key: test_roc_auc value: [0.72222222 0.47222222 0.58333333 0.75 0.83333333 0.80555556 0.72222222 0.75 1. 0.7 ] mean value: 0.7338888888888889 key: train_roc_auc value: [0.98039216 0.97058824 0.97058824 0.96078431 0.93878527 0.98039216 0.9742946 0.96498101 0.97467167 0.97467167] mean value: 0.969014931036691 key: test_jcc value: [0.63636364 0.5 0.5 0.6 0.81818182 0.7 0.63636364 0.75 1. 0.75 ] mean value: 0.6890909090909091 key: train_jcc value: [0.97619048 0.96470588 0.96470588 0.95348837 0.91954023 0.97619048 0.96428571 0.95238095 0.96428571 0.96428571] mean value: 0.960005941430301 MCC on Blind test: 0.59 Accuracy on Blind test: 0.81 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.22938943 0.23086452 0.25409341 0.41145706 0.33719325 0.23104048 0.12042832 0.26716518 0.23124266 0.19360614] mean value: 0.250648045539856 key: score_time value: [0.02094197 0.02873898 0.02438402 0.02471972 0.01987958 0.02118778 0.0118742 0.0221138 0.01199412 0.02212453] mean value: 0.020795869827270507 key: test_mcc value: [0.44444444 0.27216553 0.16666667 0.66666667 0.73854895 0.6000992 0.44444444 0.09449112 0.74535599 0.54772256] mean value: 0.4720605561480509 key: train_mcc value: [0.96845676 0.98416472 0.95286855 0.98416472 0.88847246 0.96845676 0.95223938 0.96842355 0.95281321 0.95281321] mean value: 0.9572873336128294 key: test_accuracy value: [0.73333333 0.66666667 0.6 0.8 0.86666667 0.8 0.73333333 0.53333333 0.85714286 0.78571429] mean value: 0.7376190476190476 key: train_accuracy value: [0.98496241 0.9924812 0.97744361 0.9924812 0.94736842 0.98496241 0.97744361 0.98496241 0.97761194 0.97761194] mean value: 0.9797329143754909 key: test_fscore value: [0.77777778 0.76190476 0.66666667 0.8 0.9 0.82352941 0.77777778 0.58823529 0.875 0.85714286] mean value: 0.7828034547152194 key: train_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:107: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:110: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.98795181 0.99393939 0.98203593 0.99393939 0.95808383 0.98795181 0.98181818 0.98765432 0.98181818 0.98181818] mean value: 0.983701102925786 key: test_precision value: [0.77777778 0.66666667 0.66666667 1. 0.81818182 0.875 0.77777778 0.71428571 1. 0.75 ] mean value: 0.8046356421356421 key: train_precision value: [0.97619048 0.98795181 0.96470588 0.98795181 0.94117647 0.97619048 0.97590361 0.98765432 0.97590361 0.97590361] mean value: 0.9749532084141108 key: test_recall value: [0.77777778 0.88888889 0.66666667 0.66666667 1. 0.77777778 0.77777778 0.5 0.77777778 1. ] mean value: 0.7833333333333333 key: train_recall value: [1. 1. 1. 1. 0.97560976 1. 0.98780488 0.98765432 0.98780488 0.98780488] mean value: 0.9926678711231557 key: test_roc_auc value: [0.72222222 0.61111111 0.58333333 0.83333333 0.83333333 0.80555556 0.72222222 0.55 0.88888889 0.7 ] mean value: 0.725 key: train_roc_auc value: [0.98039216 0.99019608 0.97058824 0.99019608 0.93878527 0.98039216 0.9742946 0.98421178 0.97467167 0.97467167] mean value: 0.9758399687440816 key: test_jcc value: [0.63636364 0.61538462 0.5 0.66666667 0.81818182 0.7 0.63636364 0.41666667 0.77777778 0.75 ] mean value: 0.6517404817404817 key: train_jcc value: [0.97619048 0.98795181 0.96470588 0.98795181 0.91954023 0.97619048 0.96428571 0.97560976 0.96428571 0.96428571] mean value: 0.9680997578031486 MCC on Blind test: 0.59 Accuracy on Blind test: 0.81 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.0408802 0.0270524 0.02707195 0.02973795 0.03253603 0.04366565 0.03498888 0.06585813 0.06901002 0.06005859] mean value: 0.04308598041534424 key: score_time value: [0.01190782 0.01165295 0.01165676 0.01165891 0.01165366 0.01169753 0.01182747 0.01180315 0.01178432 0.01181316] mean value: 0.011745572090148926 key: test_mcc value: [0.68888889 0.58655573 0.70710678 0.67082039 0.70710678 0.47140452 0.67082039 0.11111111 0.67082039 0.67082039] mean value: 0.5955455381604875 key: train_mcc value: [0.86510087 0.85275519 0.86591805 0.89031011 0.85365854 0.89031011 0.87804878 0.89031011 0.86591805 0.87909532] mean value: 0.8731425135908 key: test_accuracy value: [0.84210526 0.78947368 0.83333333 0.83333333 0.83333333 0.72222222 0.83333333 0.55555556 0.83333333 0.83333333] mean value: 0.7909356725146199 key: train_accuracy value: [0.93251534 0.92638037 0.93292683 0.94512195 0.92682927 0.94512195 0.93902439 0.94512195 0.93292683 0.93902439] mean value: 0.9364993266497081 key: test_fscore value: [0.84210526 0.81818182 0.8 0.84210526 0.85714286 0.66666667 0.82352941 0.55555556 0.82352941 0.82352941] mean value: 0.7852345659156805 key: train_fscore value: [0.93251534 0.92592593 0.93251534 0.94478528 0.92682927 0.94478528 0.93902439 0.94478528 0.93251534 0.9375 ] mean value: 0.9361181424953309 key: test_precision value: [0.8 0.75 1. 0.8 0.75 0.83333333 0.875 0.55555556 0.875 0.875 ] mean value: 0.8113888888888889 key: train_precision value: [0.9382716 0.92592593 0.9382716 0.95061728 0.92682927 0.95061728 0.93902439 0.95061728 0.9382716 0.96153846] mean value: 0.941998471266764 key: test_recall value: [0.88888889 0.9 0.66666667 0.88888889 1. 0.55555556 0.77777778 0.55555556 0.77777778 0.77777778] mean value: 0.7788888888888889 key: train_recall value: [0.92682927 0.92592593 0.92682927 0.93902439 0.92682927 0.93902439 0.93902439 0.93902439 0.92682927 0.91463415] mean value: 0.9303974706413731 key: test_roc_auc value: [0.84444444 0.78333333 0.83333333 0.83333333 0.83333333 0.72222222 0.83333333 0.55555556 0.83333333 0.83333333] mean value: 0.7905555555555556 key: train_roc_auc value: [0.93255044 0.9263776 0.93292683 0.94512195 0.92682927 0.94512195 0.93902439 0.94512195 0.93292683 0.93902439] mean value: 0.9365025594700391 key: test_jcc value: [0.72727273 0.69230769 0.66666667 0.72727273 0.75 0.5 0.7 0.38461538 0.7 0.7 ] mean value: 0.6548135198135198 key: train_jcc value: [0.87356322 0.86206897 0.87356322 0.89534884 0.86363636 0.89534884 0.88505747 0.89534884 0.87356322 0.88235294] mean value: 0.8799851908394765 MCC on Blind test: 0.4 Accuracy on Blind test: 0.73 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [1.8908112 0.95630956 1.52429509 1.28793001 1.42368722 1.88998628 0.99083161 1.84427047 1.74302101 2.14582276] mean value: 1.5696965217590333 key: score_time value: [0.01293707 0.01447511 0.01205897 0.01240826 0.01333857 0.01755524 0.01313376 0.0169878 0.02549314 0.01345563] mean value: 0.015184354782104493 key: test_mcc value: [0.78888889 0.68543653 0.77777778 0.67082039 0.70710678 0.56980288 0.47140452 0.34188173 0.56980288 0.89442719] mean value: 0.6477349573707021 key: train_mcc value: [1. 1. 1. 0.95121951 1. 1. 1. 1. 1. 1. ] mean value: 0.9951219512195122 key: test_accuracy value: [0.89473684 0.84210526 0.88888889 0.83333333 0.83333333 0.77777778 0.72222222 0.66666667 0.77777778 0.94444444] mean value: 0.8181286549707603 key: train_accuracy value: [1. 1. 1. 0.97560976 1. 1. 1. 1. 1. 1. ] mean value: 0.9975609756097561 key: test_fscore value: [0.88888889 0.85714286 0.88888889 0.84210526 0.85714286 0.75 0.66666667 0.625 0.75 0.94736842] mean value: 0.8073203842940685 key: train_fscore value: [1. 1. 1. 0.97560976 1. 1. 1. 1. 1. 1. ] mean value: 0.9975609756097561 key: test_precision value: [0.88888889 0.81818182 0.88888889 0.8 0.75 0.85714286 0.83333333 0.71428571 0.85714286 0.9 ] mean value: 0.8307864357864357 key: train_precision value: [1. 1. 1. 0.97560976 1. 1. 1. 1. 1. 1. ] mean value: 0.9975609756097561 key: test_recall value: [0.88888889 0.9 0.88888889 0.88888889 1. 0.66666667 0.55555556 0.55555556 0.66666667 1. ] mean value: 0.8011111111111111 key: train_recall value: [1. 1. 1. 0.97560976 1. 1. 1. 1. 1. 1. ] mean value: 0.9975609756097561 key: test_roc_auc value: [0.89444444 0.83888889 0.88888889 0.83333333 0.83333333 0.77777778 0.72222222 0.66666667 0.77777778 0.94444444] mean value: 0.8177777777777777 key: train_roc_auc value: [1. 1. 1. 0.97560976 1. 1. 1. 1. 1. 1. ] mean value: 0.9975609756097561 key: test_jcc value: [0.8 0.75 0.8 0.72727273 0.75 0.6 0.5 0.45454545 0.6 0.9 ] mean value: 0.6881818181818182 key: train_jcc value: [1. 1. 1. 0.95238095 1. 1. 1. 1. 1. 1. ] mean value: 0.9952380952380953 MCC on Blind test: 0.66 Accuracy on Blind test: 0.84 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.0283339 0.01366544 0.01121616 0.00976849 0.01593399 0.01254296 0.01012063 0.0107913 0.01603723 0.01076508] mean value: 0.01391751766204834 key: score_time value: [0.01344252 0.01378036 0.00989604 0.00934839 0.01519752 0.01089883 0.00986218 0.01424837 0.0136404 0.00985265] mean value: 0.012016725540161134 key: test_mcc value: [0.28752732 0.01807754 0.4472136 0.34188173 0. 0.70710678 0.34188173 0.1490712 0.34188173 0.53452248] mean value: 0.3169164101692284 key: train_mcc value: [0.47805638 0.42302501 0.55048188 0.57527066 0.51116565 0.57282438 0.42305348 0.52414242 0.47172818 0.58760938] mean value: 0.5117357414354574 key: test_accuracy value: [0.63157895 0.52631579 0.66666667 0.66666667 0.5 0.83333333 0.66666667 0.55555556 0.66666667 0.72222222] mean value: 0.6435672514619883 key: train_accuracy value: [0.72392638 0.65030675 0.74390244 0.78658537 0.7195122 0.76219512 0.67682927 0.74390244 0.7195122 0.76219512] mean value: 0.7288867275175819 key: test_fscore value: [0.66666667 0.66666667 0.75 0.7 0.64 0.85714286 0.7 0.66666667 0.7 0.7826087 ] mean value: 0.7129751552795032 key: train_fscore value: [0.76683938 0.73972603 0.79207921 0.79532164 0.77669903 0.80203046 0.74641148 0.78350515 0.7628866 0.80597015] mean value: 0.777146912204694 key: test_precision value: [0.58333333 0.52941176 0.6 0.63636364 0.5 0.75 0.63636364 0.53333333 0.63636364 0.64285714] mean value: 0.6048026483320601 key: train_precision value: [0.66666667 0.58695652 0.66666667 0.76404494 0.64516129 0.68695652 0.61417323 0.67857143 0.66071429 0.68067227] mean value: 0.6650583822494134 key: test_recall value: [0.77777778 0.9 1. 0.77777778 0.88888889 1. 0.77777778 0.88888889 0.77777778 1. ] mean value: 0.8788888888888888 key: train_recall value: [0.90243902 1. 0.97560976 0.82926829 0.97560976 0.96341463 0.95121951 0.92682927 0.90243902 0.98780488] mean value: 0.9414634146341463 key: test_roc_auc value: [0.63888889 0.50555556 0.66666667 0.66666667 0.5 0.83333333 0.66666667 0.55555556 0.66666667 0.72222222] mean value: 0.6422222222222222 key: train_roc_auc value: [0.72282445 0.65243902 0.74390244 0.78658537 0.7195122 0.76219512 0.67682927 0.74390244 0.7195122 0.76219512] mean value: 0.7289897621198435 key: test_jcc value: [0.5 0.5 0.6 0.53846154 0.47058824 0.75 0.53846154 0.5 0.53846154 0.64285714] mean value: 0.5578829993535875 key: train_jcc value: [0.62184874 0.58695652 0.6557377 0.66019417 0.63492063 0.66949153 0.59541985 0.6440678 0.61666667 0.675 ] mean value: 0.6360303611859688 MCC on Blind test: 0.34 Accuracy on Blind test: 0.7 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01672673 0.01119757 0.01112366 0.01720643 0.01067734 0.01029587 0.01684093 0.01211858 0.01282406 0.01079941] mean value: 0.012981057167053223 key: score_time value: [0.01407647 0.01021147 0.01025724 0.01490736 0.01006126 0.0094769 0.01515317 0.01026034 0.0116756 0.01325488] mean value: 0.011933469772338867 key: test_mcc value: [0.25844328 0.16854997 0.47140452 0.4472136 0.34188173 0.1490712 0.47140452 0.12403473 0.34188173 0.3721042 ] mean value: 0.3145989478923389 key: train_mcc value: [0.49487065 0.53197363 0.49507377 0.52223297 0.52223297 0.553295 0.525 0.55060372 0.525 0.49938477] mean value: 0.5219667468393889 key: test_accuracy value: [0.63157895 0.57894737 0.72222222 0.72222222 0.66666667 0.55555556 0.72222222 0.55555556 0.66666667 0.66666667] mean value: 0.6488304093567251 key: train_accuracy value: [0.74233129 0.75460123 0.74390244 0.75609756 0.75609756 0.76829268 0.75609756 0.76219512 0.75609756 0.73780488] mean value: 0.7533517881191082 key: test_fscore value: [0.58823529 0.55555556 0.66666667 0.70588235 0.625 0.33333333 0.66666667 0.42857143 0.625 0.57142857] mean value: 0.5766339869281046 key: train_fscore value: [0.71621622 0.71014493 0.72 0.72972973 0.72972973 0.73611111 0.7260274 0.71942446 0.7260274 0.69064748] mean value: 0.720405845128961 key: test_precision value: [0.625 0.625 0.83333333 0.75 0.71428571 0.66666667 0.83333333 0.6 0.71428571 0.8 ] mean value: 0.7161904761904762 key: train_precision value: [0.8030303 0.85964912 0.79411765 0.81818182 0.81818182 0.85483871 0.828125 0.87719298 0.828125 0.84210526] mean value: 0.8323547664551235 key: test_recall value: [0.55555556 0.5 0.55555556 0.66666667 0.55555556 0.22222222 0.55555556 0.33333333 0.55555556 0.44444444] mean value: 0.49444444444444446 key: train_recall value: [0.64634146 0.60493827 0.65853659 0.65853659 0.65853659 0.64634146 0.64634146 0.6097561 0.64634146 0.58536585] mean value: 0.6361035832580548 key: test_roc_auc value: [0.62777778 0.58333333 0.72222222 0.72222222 0.66666667 0.55555556 0.72222222 0.55555556 0.66666667 0.66666667] mean value: 0.6488888888888888 key: train_roc_auc value: [0.74292382 0.75368865 0.74390244 0.75609756 0.75609756 0.76829268 0.75609756 0.76219512 0.75609756 0.73780488] mean value: 0.753319783197832 key: test_jcc value: [0.41666667 0.38461538 0.5 0.54545455 0.45454545 0.2 0.5 0.27272727 0.45454545 0.4 ] mean value: 0.4128554778554778 key: train_jcc value: [0.55789474 0.5505618 0.5625 0.57446809 0.57446809 0.58241758 0.56989247 0.56179775 0.56989247 0.52747253] mean value: 0.5631365513743338 MCC on Blind test: 0.2 Accuracy on Blind test: 0.59 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.01319885 0.01149154 0.01570868 0.01301765 0.01050162 0.01313496 0.00960803 0.00898933 0.01465893 0.00987673] mean value: 0.012018632888793946 key: score_time value: [0.01682305 0.01638246 0.02695346 0.01870203 0.01510835 0.01587963 0.01642799 0.01527309 0.02283025 0.01292133] mean value: 0.017730164527893066 key: test_mcc value: [ 0.25844328 0.06900656 -0.11396058 0.34188173 0.33333333 0. 0.11396058 0. 0.33333333 0.4472136 ] mean value: 0.17832118280906833 key: train_mcc value: [0.48871836 0.51136091 0.50093211 0.46563593 0.48377268 0.46563593 0.4539621 0.47994775 0.47850059 0.43072234] mean value: 0.4759188688472724 key: test_accuracy value: [0.63157895 0.52631579 0.44444444 0.66666667 0.66666667 0.5 0.55555556 0.5 0.66666667 0.72222222] mean value: 0.5880116959064328 key: train_accuracy value: [0.74233129 0.75460123 0.75 0.73170732 0.73780488 0.73170732 0.72560976 0.73780488 0.73780488 0.71341463] mean value: 0.7362786173874009 key: test_fscore value: [0.58823529 0.47058824 0.375 0.625 0.66666667 0.18181818 0.5 0.30769231 0.66666667 0.70588235] mean value: 0.5087549705196764 key: train_fscore value: [0.72727273 0.74025974 0.74213836 0.71794872 0.7114094 0.71794872 0.70967742 0.71895425 0.72258065 0.69281046] mean value: 0.7201000434581415 key: test_precision value: [0.625 0.57142857 0.42857143 0.71428571 0.66666667 0.5 0.57142857 0.5 0.66666667 0.75 ] mean value: 0.5994047619047619 key: train_precision value: [0.77777778 0.78082192 0.76623377 0.75675676 0.79104478 0.75675676 0.75342466 0.77464789 0.76712329 0.74647887] mean value: 0.7671066457221539 key: test_recall value: [0.55555556 0.4 0.33333333 0.55555556 0.66666667 0.11111111 0.44444444 0.22222222 0.66666667 0.66666667] mean value: 0.4622222222222222 key: train_recall value: [0.68292683 0.7037037 0.7195122 0.68292683 0.64634146 0.68292683 0.67073171 0.67073171 0.68292683 0.64634146] mean value: 0.6789069557362241 key: test_roc_auc value: [0.62777778 0.53333333 0.44444444 0.66666667 0.66666667 0.5 0.55555556 0.5 0.66666667 0.72222222] mean value: 0.5883333333333334 key: train_roc_auc value: [0.74269798 0.75429088 0.75 0.73170732 0.73780488 0.73170732 0.72560976 0.73780488 0.73780488 0.71341463] mean value: 0.7362842517314063 key: test_jcc value: [0.41666667 0.30769231 0.23076923 0.45454545 0.5 0.1 0.33333333 0.18181818 0.5 0.54545455] mean value: 0.35702797202797204 key: train_jcc value: [0.57142857 0.58762887 0.59 0.56 0.55208333 0.56 0.55 0.56122449 0.56565657 0.53 ] mean value: 0.562802182619377 MCC on Blind test: -0.01 Accuracy on Blind test: 0.51 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01490951 0.01831555 0.01540542 0.01234603 0.01195741 0.01603913 0.01533437 0.01074004 0.01064444 0.01078343] mean value: 0.01364753246307373 key: score_time value: [0.01275778 0.01393151 0.01058316 0.01042771 0.01456451 0.01108956 0.01409864 0.00900912 0.00917506 0.00897956] mean value: 0.011461663246154784 key: test_mcc value: [0.38204659 0.03580574 0.56980288 0.77777778 0.47140452 0.3721042 0.55555556 0.34188173 0.55555556 0.47140452] mean value: 0.45333390783468236 key: train_mcc value: [0.73580611 0.78000692 0.74440079 0.76972494 0.75812978 0.80583738 0.76880738 0.78072006 0.75812978 0.78141806] mean value: 0.7682981208235539 key: test_accuracy value: [0.68421053 0.52631579 0.77777778 0.88888889 0.72222222 0.66666667 0.77777778 0.66666667 0.77777778 0.72222222] mean value: 0.7210526315789474 key: train_accuracy value: [0.86503067 0.88957055 0.87195122 0.88414634 0.87804878 0.90243902 0.88414634 0.8902439 0.87804878 0.8902439 ] mean value: 0.8833869519676791 key: test_fscore value: [0.7 0.60869565 0.75 0.88888889 0.76190476 0.57142857 0.77777778 0.625 0.77777778 0.66666667] mean value: 0.7128140096618357 key: train_fscore value: [0.85714286 0.88607595 0.86956522 0.88050314 0.87341772 0.9 0.88198758 0.89156627 0.87341772 0.8875 ] mean value: 0.8801176454293306 key: test_precision value: [0.63636364 0.53846154 0.85714286 0.88888889 0.66666667 0.8 0.77777778 0.71428571 0.77777778 0.83333333] mean value: 0.7490698190698191 key: train_precision value: [0.91666667 0.90909091 0.88607595 0.90909091 0.90789474 0.92307692 0.89873418 0.88095238 0.90789474 0.91025641] mean value: 0.9049733799400688 key: test_recall value: [0.77777778 0.7 0.66666667 0.88888889 0.88888889 0.44444444 0.77777778 0.55555556 0.77777778 0.55555556] mean value: 0.7033333333333334 key: train_recall value: [0.80487805 0.86419753 0.85365854 0.85365854 0.84146341 0.87804878 0.86585366 0.90243902 0.84146341 0.86585366] mean value: 0.8571514604034929 key: test_roc_auc value: [0.68888889 0.51666667 0.77777778 0.88888889 0.72222222 0.66666667 0.77777778 0.66666667 0.77777778 0.72222222] mean value: 0.7205555555555555 key: train_roc_auc value: [0.86540199 0.88941584 0.87195122 0.88414634 0.87804878 0.90243902 0.88414634 0.8902439 0.87804878 0.8902439 ] mean value: 0.8834086118638964 key: test_jcc value: [0.53846154 0.4375 0.6 0.8 0.61538462 0.4 0.63636364 0.45454545 0.63636364 0.5 ] mean value: 0.5618618881118881 key: train_jcc value: [0.75 0.79545455 0.76923077 0.78651685 0.7752809 0.81818182 0.78888889 0.80434783 0.7752809 0.79775281] mean value: 0.7860935308517135 MCC on Blind test: 0.31 Accuracy on Blind test: 0.68 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline:/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.69331741 0.64710665 0.66360855 0.86699152 0.70964789 0.75243115 0.78330445 0.70622683 0.71316576 0.76179647] mean value: 0.729759669303894 key: score_time value: [0.01211405 0.01341152 0.01725698 0.01392078 0.01340365 0.02822709 0.01344252 0.01362491 0.01334286 0.01341009] mean value: 0.015215444564819335 key: test_mcc value: [0.26666667 0.36803496 0.70710678 0.67082039 0.70710678 0.56980288 0.55555556 0.33333333 0.56980288 0.67082039] mean value: 0.5419050634007493 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.63157895 0.68421053 0.83333333 0.83333333 0.83333333 0.77777778 0.77777778 0.66666667 0.77777778 0.83333333] mean value: 0.7649122807017544 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.63157895 0.72727273 0.8 0.84210526 0.85714286 0.75 0.77777778 0.66666667 0.75 0.82352941] mean value: 0.7626073651151051 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.6 0.66666667 1. 0.8 0.75 0.85714286 0.77777778 0.66666667 0.85714286 0.875 ] mean value: 0.7850396825396825 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.8 0.66666667 0.88888889 1. 0.66666667 0.77777778 0.66666667 0.66666667 0.77777778] mean value: 0.7577777777777778 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.63333333 0.67777778 0.83333333 0.83333333 0.83333333 0.77777778 0.77777778 0.66666667 0.77777778 0.83333333] mean value: 0.7644444444444444 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.46153846 0.57142857 0.66666667 0.72727273 0.75 0.6 0.63636364 0.5 0.6 0.7 ] mean value: 0.6213270063270063 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.34 Accuracy on Blind test: 0.7 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01788473 0.01593089 0.01349783 0.01348853 0.01319408 0.01268983 0.01400018 0.01222539 0.01735711 0.01341629] mean value: 0.014368486404418946 key: score_time value: [0.01218581 0.00902629 0.00874782 0.00860453 0.00856733 0.00931859 0.00903726 0.0087738 0.01255965 0.00958991] mean value: 0.009641098976135253 key: test_mcc value: [0.9 0.80507649 1. 0.67082039 0.56980288 1. 0.67082039 0.33333333 0.89442719 0.79772404] mean value: 0.7642004714248192 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94736842 0.89473684 1. 0.83333333 0.77777778 1. 0.83333333 0.66666667 0.94444444 0.88888889] mean value: 0.8786549707602339 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94736842 0.90909091 1. 0.84210526 0.8 1. 0.84210526 0.66666667 0.94117647 0.875 ] mean value: 0.8823512993714232 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.9 0.83333333 1. 0.8 0.72727273 1. 0.8 0.66666667 1. 1. ] mean value: 0.8727272727272728 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 0.88888889 0.88888889 1. 0.88888889 0.66666667 0.88888889 0.77777778] mean value: 0.9 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95 0.88888889 1. 0.83333333 0.77777778 1. 0.83333333 0.66666667 0.94444444 0.88888889] mean value: 0.8783333333333333 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.9 0.83333333 1. 0.72727273 0.66666667 1. 0.72727273 0.5 0.88888889 0.77777778] mean value: 0.8021212121212121 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.78 Accuracy on Blind test: 0.89 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.09235191 0.09098029 0.09063077 0.09804845 0.09962702 0.10106015 0.09401655 0.09447575 0.09316206 0.09071183] mean value: 0.09450647830963135 key: score_time value: [0.0171411 0.01724839 0.01732278 0.01876092 0.01842284 0.01923037 0.01827502 0.0200417 0.01728535 0.01696825] mean value: 0.01806967258453369 key: test_mcc value: [0.47777778 0.39056329 0.67082039 0.67082039 0.3721042 0.70710678 0.4472136 0.11111111 0.89442719 0.77777778] mean value: 0.5519722513396789 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73684211 0.68421053 0.83333333 0.83333333 0.66666667 0.83333333 0.72222222 0.55555556 0.94444444 0.88888889] mean value: 0.7698830409356725 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.73684211 0.75 0.82352941 0.84210526 0.72727273 0.8 0.70588235 0.55555556 0.94736842 0.88888889] mean value: 0.7777444725896738 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.7 0.64285714 0.875 0.8 0.61538462 1. 0.75 0.55555556 0.9 0.88888889] mean value: 0.7727686202686203 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.77777778 0.9 0.77777778 0.88888889 0.88888889 0.66666667 0.66666667 0.55555556 1. 0.88888889] mean value: 0.8011111111111111 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.73888889 0.67222222 0.83333333 0.83333333 0.66666667 0.83333333 0.72222222 0.55555556 0.94444444 0.88888889] mean value: 0.7688888888888888 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.58333333 0.6 0.7 0.72727273 0.57142857 0.66666667 0.54545455 0.38461538 0.9 0.8 ] mean value: 0.6478771228771228 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.4 Accuracy on Blind test: 0.73 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00902104 0.00883174 0.00882602 0.00874376 0.00882483 0.00876188 0.00883603 0.00873995 0.00897932 0.00881672] mean value: 0.008838129043579102 key: score_time value: [0.00852895 0.00839901 0.0083952 0.00833488 0.0083344 0.0084455 0.00834584 0.00843763 0.00840974 0.00853491] mean value: 0.00841660499572754 key: test_mcc value: [0.28752732 0.71611487 0.12403473 0.4472136 0.4472136 0.62017367 0.34188173 0.11396058 0.56980288 0.47140452] mean value: 0.413932749789502 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.63157895 0.84210526 0.55555556 0.72222222 0.72222222 0.77777778 0.66666667 0.55555556 0.77777778 0.72222222] mean value: 0.6973684210526315 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.86956522 0.42857143 0.73684211 0.73684211 0.71428571 0.625 0.6 0.75 0.66666667] mean value: 0.6794439904108096 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.58333333 0.76923077 0.6 0.7 0.7 1. 0.71428571 0.54545455 0.85714286 0.83333333] mean value: 0.7302780552780552 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.77777778 1. 0.33333333 0.77777778 0.77777778 0.55555556 0.55555556 0.66666667 0.66666667 0.55555556] mean value: 0.6666666666666666 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.63888889 0.83333333 0.55555556 0.72222222 0.72222222 0.77777778 0.66666667 0.55555556 0.77777778 0.72222222] mean value: 0.6972222222222222 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.76923077 0.27272727 0.58333333 0.58333333 0.55555556 0.45454545 0.42857143 0.6 0.5 ] mean value: 0.5247297147297147 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.22 Accuracy on Blind test: 0.65 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.20718908 1.40801001 1.27958941 1.28461361 1.32606244 1.32907915 1.33875179 1.25243473 1.26152778 1.24775457] mean value: 1.2935012578964233 key: score_time value: [0.14160895 0.09528899 0.1088841 0.10485816 0.09723663 0.09495139 0.09413624 0.09549427 0.09452558 0.09518194] mean value: 0.10221662521362304 key: test_mcc value: [0.89893315 0.58655573 0.79772404 0.67082039 0.47140452 0.79772404 0.56980288 0.33333333 0.89442719 0.77777778] mean value: 0.6798503044277107 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94736842 0.78947368 0.88888889 0.83333333 0.72222222 0.88888889 0.77777778 0.66666667 0.94444444 0.88888889] mean value: 0.8347953216374269 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94117647 0.81818182 0.875 0.84210526 0.76190476 0.875 0.75 0.66666667 0.94736842 0.88888889] mean value: 0.8366292290440898 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.75 1. 0.8 0.66666667 1. 0.85714286 0.66666667 0.9 0.88888889] mean value: 0.8529365079365079 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88888889 0.9 0.77777778 0.88888889 0.88888889 0.77777778 0.66666667 0.66666667 1. 0.88888889] mean value: 0.8344444444444444 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94444444 0.78333333 0.88888889 0.83333333 0.72222222 0.88888889 0.77777778 0.66666667 0.94444444 0.88888889] mean value: 0.8338888888888889 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( value: [0.88888889 0.69230769 0.77777778 0.72727273 0.61538462 0.77777778 0.6 0.5 0.9 0.8 ] mean value: 0.727940947940948 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.59 Accuracy on Blind test: 0.81 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.86686087 0.89369035 0.86090946 0.92003894 0.87380433 0.89239073 0.92993355 0.87646794 0.89022374 0.97747231] mean value: 0.8981792211532593 key: score_time value: [0.15744424 0.19940758 0.16025209 0.1420064 0.14563394 0.15724635 0.2215755 0.11703801 0.16666937 0.12801671] mean value: 0.1595290184020996 key: test_mcc value: [0.78888889 0.68543653 0.70710678 0.67082039 0.3721042 0.79772404 0.89442719 0.2236068 0.67082039 0.77777778] mean value: 0.6588712988925703 key: train_mcc value: [0.96325856 0.96385008 0.93909422 0.95150257 0.93965346 0.92710507 0.92710507 0.96406004 0.93965346 0.95150257] mean value: 0.9466785096847014 key: test_accuracy value: [0.89473684 0.84210526 0.83333333 0.83333333 0.66666667 0.88888889 0.94444444 0.61111111 0.83333333 0.88888889] mean value: 0.8236842105263158 key: train_accuracy value: [0.98159509 0.98159509 0.9695122 0.97560976 0.9695122 0.96341463 0.96341463 0.98170732 0.9695122 0.97560976] mean value: 0.9731482866975909 key: test_fscore value: [0.88888889 0.85714286 0.8 0.84210526 0.72727273 0.875 0.94117647 0.58823529 0.84210526 0.88888889] mean value: 0.8250815653215035 key: train_fscore value: [0.98181818 0.98181818 0.96969697 0.97590361 0.97005988 0.96385542 0.96385542 0.98203593 0.97005988 0.97590361] mean value: 0.9735007094245244 key: test_precision value: [0.88888889 0.81818182 1. 0.8 0.61538462 1. 1. 0.625 0.8 0.88888889] mean value: 0.8436344211344211 key: train_precision value: [0.97590361 0.96428571 0.96385542 0.96428571 0.95294118 0.95238095 0.95238095 0.96470588 0.95294118 0.96428571] mean value: 0.9607966319057744 key: test_recall value: [0.88888889 0.9 0.66666667 0.88888889 0.88888889 0.77777778 0.88888889 0.55555556 0.88888889 0.88888889] mean value: 0.8233333333333333 key: train_recall value: [0.98780488 1. 0.97560976 0.98780488 0.98780488 0.97560976 0.97560976 1. 0.98780488 0.98780488] mean value: 0.9865853658536585 key: test_roc_auc value: [0.89444444 0.83888889 0.83333333 0.83333333 0.66666667 0.88888889 0.94444444 0.61111111 0.83333333 0.88888889] mean value: 0.8233333333333333 key: train_roc_auc value: [0.98155676 0.98170732 0.9695122 0.97560976 0.9695122 0.96341463 0.96341463 0.98170732 0.9695122 0.97560976] mean value: 0.9731556760012045 key: test_jcc value: [0.8 0.75 0.66666667 0.72727273 0.57142857 0.77777778 0.88888889 0.41666667 0.72727273 0.8 ] mean value: 0.7125974025974026 key: train_jcc value: [0.96428571 0.96428571 0.94117647 0.95294118 0.94186047 0.93023256 0.93023256 0.96470588 0.94186047 0.95294118] mean value: 0.9484522180965409 MCC on Blind test: 0.65 Accuracy on Blind test: 0.84 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02150226 0.00884724 0.00999117 0.0089252 0.00994086 0.00981355 0.00895786 0.00987864 0.01000977 0.00989366] mean value: 0.010776019096374512 key: score_time value: [0.01239491 0.00845385 0.00954509 0.00856543 0.00893474 0.00922799 0.0086267 0.00930977 0.00923538 0.00921655] mean value: 0.009351038932800293 key: test_mcc value: [0.25844328 0.16854997 0.47140452 0.4472136 0.34188173 0.1490712 0.47140452 0.12403473 0.34188173 0.3721042 ] mean value: 0.3145989478923389 key: train_mcc value: [0.49487065 0.53197363 0.49507377 0.52223297 0.52223297 0.553295 0.525 0.55060372 0.525 0.49938477] mean value: 0.5219667468393889 key: test_accuracy value: [0.63157895 0.57894737 0.72222222 0.72222222 0.66666667 0.55555556 0.72222222 0.55555556 0.66666667 0.66666667] mean value: 0.6488304093567251 key: train_accuracy value: [0.74233129 0.75460123 0.74390244 0.75609756 0.75609756 0.76829268 0.75609756 0.76219512 0.75609756 0.73780488] mean value: 0.7533517881191082 key: test_fscore value: [0.58823529 0.55555556 0.66666667 0.70588235 0.625 0.33333333 0.66666667 0.42857143 0.625 0.57142857] mean value: 0.5766339869281046 key: train_fscore value: [0.71621622 0.71014493 0.72 0.72972973 0.72972973 0.73611111 0.7260274 0.71942446 0.7260274 0.69064748] mean value: 0.720405845128961 key: test_precision value: [0.625 0.625 0.83333333 0.75 0.71428571 0.66666667 0.83333333 0.6 0.71428571 0.8 ] mean value: 0.7161904761904762 key: train_precision value: [0.8030303 0.85964912 0.79411765 0.81818182 0.81818182 0.85483871 0.828125 0.87719298 0.828125 0.84210526] mean value: 0.8323547664551235 key: test_recall value: [0.55555556 0.5 0.55555556 0.66666667 0.55555556 0.22222222 0.55555556 0.33333333 0.55555556 0.44444444] mean value: 0.49444444444444446 key: train_recall value: [0.64634146 0.60493827 0.65853659 0.65853659 0.65853659 0.64634146 0.64634146 0.6097561 0.64634146 0.58536585] mean value: 0.6361035832580548 key: test_roc_auc value: [0.62777778 0.58333333 0.72222222 0.72222222 0.66666667 0.55555556 0.72222222 0.55555556 0.66666667 0.66666667] mean value: 0.6488888888888888 key: train_roc_auc value: [0.74292382 0.75368865 0.74390244 0.75609756 0.75609756 0.76829268 0.75609756 0.76219512 0.75609756 0.73780488] mean value: 0.753319783197832 key: test_jcc value: [0.41666667 0.38461538 0.5 0.54545455 0.45454545 0.2 0.5 0.27272727 0.45454545 0.4 ] mean value: 0.4128554778554778 key: train_jcc value: [0.55789474 0.5505618 0.5625 0.57446809 0.57446809 0.58241758 0.56989247 0.56179775 0.56989247 0.52747253] mean value: 0.5631365513743338 MCC on Blind test: 0.2 Accuracy on Blind test: 0.59 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.09501481 0.23909044 0.12473583 0.04655695 0.06785107 0.07981181 0.39118195 0.06984305 0.06249595 0.29534435] mean value: 0.1471926212310791 key: score_time value: [0.01050901 0.01090145 0.01074934 0.01110911 0.01052189 0.01095223 0.01346135 0.01077437 0.01050735 0.01233649] mean value: 0.011182260513305665 key: test_mcc value: [0.89893315 0.89893315 1. 0.77777778 0.56980288 0.89442719 0.79772404 0.4472136 1. 1. ] mean value: 0.8284811781695286 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94736842 0.94736842 1. 0.88888889 0.77777778 0.94444444 0.88888889 0.72222222 1. 1. ] mean value: 0.9116959064327486 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94117647 0.95238095 1. 0.88888889 0.8 0.94117647 0.875 0.73684211 1. 1. ] mean value: 0.9135464887709469 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.90909091 1. 0.88888889 0.72727273 1. 1. 0.7 1. 1. ] mean value: 0.9225252525252525 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88888889 1. 1. 0.88888889 0.88888889 0.88888889 0.77777778 0.77777778 1. 1. ] mean value: 0.9111111111111111 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94444444 0.94444444 1. 0.88888889 0.77777778 0.94444444 0.88888889 0.72222222 1. 1. ] mean value: 0.9111111111111111 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88888889 0.90909091 1. 0.8 0.66666667 0.88888889 0.77777778 0.58333333 1. 1. ] mean value: 0.8514646464646465 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.03162622 0.05078006 0.06450272 0.02454734 0.05303431 0.04205704 0.03058195 0.03836989 0.05589056 0.04631686] mean value: 0.04377069473266602 key: score_time value: [0.02164102 0.01792717 0.01255107 0.01289773 0.0353806 0.01196766 0.01205635 0.02032638 0.02122998 0.01304936] mean value: 0.017902731895446777 key: test_mcc value: [0.68888889 0.57777778 0.79772404 0.67082039 0.56980288 0.4472136 0.4472136 0.11396058 0.34188173 0.67082039] mean value: 0.5326103867520664 key: train_mcc value: [1. 0.98780488 0.98787834 0.97560976 0.98787834 1. 0.98787834 1. 0.98787834 0.98787834] mean value: 0.9902806333682408 key: test_accuracy value: [0.84210526 0.78947368 0.88888889 0.83333333 0.77777778 0.72222222 0.72222222 0.55555556 0.66666667 0.83333333] mean value: 0.7631578947368421 key: train_accuracy value: [1. 0.99386503 0.99390244 0.98780488 0.99390244 1. 0.99390244 1. 0.99390244 0.99390244] mean value: 0.9951182103845578 key: test_fscore value: [0.84210526 0.8 0.9 0.82352941 0.75 0.70588235 0.70588235 0.5 0.625 0.82352941] mean value: 0.747592879256966 key: train_fscore value: [1. 0.99386503 0.99393939 0.98780488 0.99386503 1. 0.99393939 1. 0.99393939 0.99393939] mean value: 0.9951292515156049 key: test_precision value: [0.8 0.8 0.81818182 0.875 0.85714286 0.75 0.75 0.57142857 0.71428571 0.875 ] mean value: 0.7811038961038961 key: train_precision value: [1. 0.98780488 0.98795181 0.98780488 1. 1. 0.98795181 1. 0.98795181 0.98795181] mean value: 0.9927416985013223 key: test_recall value: [0.88888889 0.8 1. 0.77777778 0.66666667 0.66666667 0.66666667 0.44444444 0.55555556 0.77777778] mean value: 0.7244444444444444 key: train_recall value: [1. 1. 1. 0.98780488 0.98780488 1. 1. 1. 1. 1. ] mean value: 0.9975609756097561 key: test_roc_auc value: [0.84444444 0.78888889 0.88888889 0.83333333 0.77777778 0.72222222 0.72222222 0.55555556 0.66666667 0.83333333] mean value: 0.7633333333333333 key: train_roc_auc value: [1. 0.99390244 0.99390244 0.98780488 0.99390244 1. 0.99390244 1. 0.99390244 0.99390244] mean value: 0.9951219512195122 key: test_jcc value: [0.72727273 0.66666667 0.81818182 0.7 0.6 0.54545455 0.54545455 0.33333333 0.45454545 0.7 ] mean value: 0.6090909090909091 key: train_jcc value: [1. 0.98780488 0.98795181 0.97590361 0.98780488 1. 0.98795181 1. 0.98795181 0.98795181] mean value: 0.9903320599471055 MCC on Blind test: 0.31 Accuracy on Blind test: 0.65 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02618909 0.00915051 0.00890017 0.00884986 0.00900197 0.00903964 0.00874496 0.00874972 0.00885129 0.00870752] mean value: 0.010618472099304199 key: score_time value: [0.01357293 0.00898504 0.00841975 0.00844479 0.00916314 0.00864339 0.00847054 0.00853252 0.0083828 0.00896931] mean value: 0.00915842056274414 key: test_mcc value: [0.05555556 0.36666667 0.67082039 0.2236068 0.3721042 0.47140452 0.56980288 0.2236068 0.2236068 0.4472136 ] mean value: 0.36243882110789005 key: train_mcc value: [0.47384761 0.49713703 0.45152179 0.45125307 0.44020439 0.51219512 0.41475753 0.47735225 0.50033496 0.50003718] mean value: 0.47186409350584424 key: test_accuracy value: [0.52631579 0.68421053 0.83333333 0.61111111 0.66666667 0.72222222 0.77777778 0.61111111 0.61111111 0.72222222] mean value: 0.6766081871345029 key: train_accuracy value: [0.73619632 0.74846626 0.72560976 0.72560976 0.7195122 0.75609756 0.70731707 0.73780488 0.75 0.75 ] mean value: 0.7356613796199312 key: test_fscore value: [0.52631579 0.7 0.84210526 0.63157895 0.72727273 0.66666667 0.75 0.63157895 0.63157895 0.70588235] mean value: 0.6812979641617413 key: train_fscore value: [0.74853801 0.74213836 0.73053892 0.72727273 0.72941176 0.75609756 0.71084337 0.74853801 0.75449102 0.75151515] mean value: 0.7399384906254795 key: test_precision value: [0.5 0.7 0.8 0.6 0.61538462 0.83333333 0.85714286 0.6 0.6 0.75 ] mean value: 0.6855860805860806 key: train_precision value: [0.71910112 0.75641026 0.71764706 0.72289157 0.70454545 0.75609756 0.70238095 0.71910112 0.74117647 0.74698795] mean value: 0.7286339518987338 key: test_recall value: [0.55555556 0.7 0.88888889 0.66666667 0.88888889 0.55555556 0.66666667 0.66666667 0.66666667 0.66666667] mean value: 0.6922222222222222 key: train_recall value: [0.7804878 0.72839506 0.74390244 0.73170732 0.75609756 0.75609756 0.7195122 0.7804878 0.76829268 0.75609756] mean value: 0.7521077988557663 key: test_roc_auc value: [0.52777778 0.68333333 0.83333333 0.61111111 0.66666667 0.72222222 0.77777778 0.61111111 0.61111111 0.72222222] mean value: 0.6766666666666666 key: train_roc_auc value: [0.73592291 0.74834387 0.72560976 0.72560976 0.7195122 0.75609756 0.70731707 0.73780488 0.75 0.75 ] mean value: 0.7356218006624511 key: test_jcc value: [0.35714286 0.53846154 0.72727273 0.46153846 0.57142857 0.5 0.6 0.46153846 0.46153846 0.54545455] mean value: 0.5224375624375625 key: train_jcc value: [0.59813084 0.59 0.5754717 0.57142857 0.57407407 0.60784314 0.55140187 0.59813084 0.60576923 0.60194175] mean value: 0.5874192010614671 MCC on Blind test: 0.31 Accuracy on Blind test: 0.68 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01138663 0.01504159 0.01759362 0.01546073 0.0162313 0.01648188 0.0172379 0.0149343 0.01540208 0.01473355] mean value: 0.015450358390808105 key: score_time value: [0.00881577 0.01157713 0.01190829 0.01156306 0.0117166 0.01163983 0.01195574 0.01151013 0.0118072 0.01173139] mean value: 0.011422514915466309 key: test_mcc value: [0.68888889 0.68543653 0.89442719 0.53452248 0.62017367 0.67082039 0.56980288 0.2236068 0.26726124 0.70710678] mean value: 0.5862046859894403 key: train_mcc value: [0.86816623 0.93871406 0.91798509 0.72987004 0.89565496 0.75955453 0.95235327 0.92793395 0.65275337 0.73970927] mean value: 0.8382694768223476 key: test_accuracy value: [0.84210526 0.84210526 0.94444444 0.72222222 0.77777778 0.83333333 0.77777778 0.61111111 0.61111111 0.83333333] mean value: 0.7795321637426901 key: train_accuracy value: [0.93251534 0.96932515 0.95731707 0.84756098 0.94512195 0.86585366 0.97560976 0.96341463 0.79878049 0.85365854] mean value: 0.9109157563968278 key: test_fscore value: [0.84210526 0.85714286 0.94736842 0.7826087 0.81818182 0.84210526 0.75 0.63157895 0.46153846 0.85714286] mean value: 0.778977258439501 key: train_fscore value: [0.93567251 0.9689441 0.95906433 0.86772487 0.94797688 0.88172043 0.975 0.96428571 0.7480916 0.87234043] mean value: 0.912082086080032 key: test_precision value: [0.8 0.81818182 0.9 0.64285714 0.69230769 0.8 0.85714286 0.6 0.75 0.75 ] mean value: 0.7610489510489511 key: train_precision value: [0.8988764 0.975 0.92134831 0.76635514 0.9010989 0.78846154 1. 0.94186047 1. 0.77358491] mean value: 0.8966585669625136 key: test_recall value: [0.88888889 0.9 1. 1. 1. 0.88888889 0.66666667 0.66666667 0.33333333 1. ] mean value: 0.8344444444444444 key: train_recall value: [0.97560976 0.96296296 1. 1. 1. 1. 0.95121951 0.98780488 0.59756098 1. ] mean value: 0.9475158084914183 key: test_roc_auc value: [0.84444444 0.83888889 0.94444444 0.72222222 0.77777778 0.83333333 0.77777778 0.61111111 0.61111111 0.83333333] mean value: 0.7794444444444444 key: train_roc_auc value: [0.93224932 0.96928636 0.95731707 0.84756098 0.94512195 0.86585366 0.97560976 0.96341463 0.79878049 0.85365854] mean value: 0.9108852755194219 key: test_jcc value: [0.72727273 0.75 0.9 0.64285714 0.69230769 0.72727273 0.6 0.46153846 0.3 0.75 ] mean value: 0.6551248751248752 key: train_jcc value: [0.87912088 0.93975904 0.92134831 0.76635514 0.9010989 0.78846154 0.95121951 0.93103448 0.59756098 0.77358491] mean value: 0.844954368584343 MCC on Blind test: 0.44 Accuracy on Blind test: 0.73 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01441765 0.01369786 0.01448107 0.01370692 0.01383114 0.01426673 0.01549292 0.01588058 0.01495886 0.01427841] mean value: 0.014501214027404785 key: score_time value: [0.01004958 0.01142454 0.01141286 0.01143146 0.01136494 0.01143551 0.01144528 0.01216745 0.01214123 0.01187587] mean value: 0.011474871635437011 key: test_mcc value: [0.78888889 0.50604808 0.67082039 0.62017367 0.79772404 0.53452248 0.56980288 0.23570226 0.24253563 0.70710678] mean value: 0.5673325099894828 key: train_mcc value: [0.87043375 0.67895422 0.77964295 0.75955453 0.70891756 0.44393726 1. 0.95150257 0.35112344 0.91798509] mean value: 0.7462051380346357 key: test_accuracy value: [0.89473684 0.73684211 0.83333333 0.77777778 0.88888889 0.72222222 0.77777778 0.61111111 0.55555556 0.83333333] mean value: 0.7631578947368421 key: train_accuracy value: [0.93251534 0.81595092 0.87804878 0.86585366 0.84146341 0.66463415 1. 0.97560976 0.6097561 0.95731707] mean value: 0.854114918449798 key: test_fscore value: [0.88888889 0.70588235 0.84210526 0.81818182 0.875 0.7826087 0.75 0.53333333 0.69230769 0.8 ] mean value: 0.7688308044462978 key: train_fscore value: [0.92903226 0.77272727 0.89130435 0.88172043 0.81690141 0.74885845 1. 0.97530864 0.71929825 0.95541401] mean value: 0.8690565064992889 key: test_precision value: [0.88888889 0.85714286 0.8 0.69230769 1. 0.64285714 0.85714286 0.66666667 0.52941176 1. ] mean value: 0.7934417869711987 key: train_precision value: [0.98630137 1. 0.80392157 0.78846154 0.96666667 0.59854015 1. 0.9875 0.56164384 1. ] mean value: 0.869303512522051 key: test_recall value: [0.88888889 0.6 0.88888889 1. 0.77777778 1. 0.66666667 0.44444444 1. 0.66666667] mean value: 0.7933333333333333 key: train_recall value: [0.87804878 0.62962963 1. 1. 0.70731707 1. 1. 0.96341463 1. 0.91463415] mean value: 0.9093044263775971 key: test_roc_auc value: [0.89444444 0.74444444 0.83333333 0.77777778 0.88888889 0.72222222 0.77777778 0.61111111 0.55555556 0.83333333] mean value: 0.7638888888888888 key: train_roc_auc value: [0.93285155 0.81481481 0.87804878 0.86585366 0.84146341 0.66463415 1. 0.97560976 0.6097561 0.95731707] mean value: 0.8540349292381813 key: test_jcc value: [0.8 0.54545455 0.72727273 0.69230769 0.77777778 0.64285714 0.6 0.36363636 0.52941176 0.66666667] mean value: 0.6345384680678798 key: train_jcc value: [0.86746988 0.62962963 0.80392157 0.78846154 0.69047619 0.59854015 1. 0.95180723 0.56164384 0.91463415] mean value: 0.7806584163571848 MCC on Blind test: 0.56 Accuracy on Blind test: 0.78 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.12057924 0.11223555 0.11333489 0.11369205 0.11219025 0.11550355 0.11615276 0.11330104 0.1124258 0.11430383] mean value: 0.1143718957901001 key: score_time value: [0.01463461 0.01489043 0.01503968 0.01504302 0.01546264 0.01508927 0.01510382 0.01506543 0.01499128 0.01474929] mean value: 0.015006947517395019 key: test_mcc value: [1. 0.80507649 1. 0.67082039 0.56980288 0.79772404 0.89442719 0.33333333 0.77777778 0.70710678] mean value: 0.755606887996258 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.89473684 1. 0.83333333 0.77777778 0.88888889 0.94444444 0.66666667 0.88888889 0.83333333] mean value: 0.8728070175438596 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.90909091 1. 0.84210526 0.8 0.875 0.94117647 0.66666667 0.88888889 0.8 ] mean value: 0.8722928198392594 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.83333333 1. 0.8 0.72727273 1. 1. 0.66666667 0.88888889 1. ] mean value: 0.8916161616161616 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 0.88888889 0.88888889 0.77777778 0.88888889 0.66666667 0.88888889 0.66666667] mean value: 0.8666666666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.88888889 1. 0.83333333 0.77777778 0.88888889 0.94444444 0.66666667 0.88888889 0.83333333] mean value: 0.8722222222222222 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.83333333 1. 0.72727273 0.66666667 0.77777778 0.88888889 0.5 0.8 0.66666667] mean value: 0.786060606060606 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.66 Accuracy on Blind test: 0.84 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.0368824 0.03443956 0.04048729 0.0401268 0.05609751 0.04586601 0.0384922 0.03691435 0.03752112 0.04061699] mean value: 0.04074442386627197 key: score_time value: [0.0196774 0.02022862 0.03363061 0.03083062 0.02206516 0.02526069 0.0358851 0.02022338 0.02485967 0.02349877] mean value: 0.025616002082824708 key: test_mcc value: [0.80507649 1. 0.89442719 0.79772404 0.70710678 0.77777778 0.67082039 0.2236068 0.79772404 1. ] mean value: 0.7674263497298501 key: train_mcc value: [1. 1. 0.97590007 1. 0.98787834 1. 0.96406004 1. 0.98787834 0.98787834] mean value: 0.990359513473492 key: test_accuracy value: [0.89473684 1. 0.94444444 0.88888889 0.83333333 0.88888889 0.83333333 0.61111111 0.88888889 1. ] mean value: 0.8783625730994152 key: train_accuracy value: [1. 1. 0.98780488 1. 0.99390244 1. 0.98170732 1. 0.99390244 0.99390244] mean value: 0.9951219512195122 key: test_fscore value: [0.875 1. 0.94117647 0.875 0.85714286 0.88888889 0.82352941 0.58823529 0.875 1. ] mean value: 0.8723972922502334 key: train_fscore value: [1. 1. 0.98765432 1. 0.99386503 1. 0.98136646 1. 0.99393939 0.99393939] mean value: 0.9950764599168618 key: test_precision value: [1. 1. 1. 1. 0.75 0.88888889 0.875 0.625 1. 1. ] mean value: 0.9138888888888889 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 0.98795181 0.98795181] mean value: 0.9975903614457832 key: test_recall value: [0.77777778 1. 0.88888889 0.77777778 1. 0.88888889 0.77777778 0.55555556 0.77777778 1. ] mean value: 0.8444444444444444 key: train_recall value: [1. 1. 0.97560976 1. 0.98780488 1. 0.96341463 1. 1. 1. ] mean value: 0.9926829268292683 key: test_roc_auc value: [0.88888889 1. 0.94444444 0.88888889 0.83333333 0.88888889 0.83333333 0.61111111 0.88888889 1. ] mean value: 0.8777777777777778 key: train_roc_auc value: [1. 1. 0.98780488 1. 0.99390244 1. 0.98170732 1. 0.99390244 0.99390244] mean value: 0.9951219512195122 key: test_jcc value: [0.77777778 1. 0.88888889 0.77777778 0.75 0.8 0.7 0.41666667 0.77777778 1. ] mean value: 0.7888888888888889 key: train_jcc value: [1. 1. 0.97560976 1. 0.98780488 1. 0.96341463 1. 0.98795181 0.98795181] mean value: 0.9902732882750515 MCC on Blind test: 0.67 Accuracy on Blind test: 0.84 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.02155781 0.02346563 0.02286315 0.04235959 0.02257943 0.02250385 0.02461076 0.05078363 0.05083275 0.02259588] mean value: 0.03041524887084961 key: score_time value: [0.01257849 0.01255059 0.01254272 0.02155042 0.01245618 0.01247454 0.01241016 0.02084184 0.02274609 0.01258111] mean value: 0.015273213386535645 key: test_mcc value: [0.26666667 0.16854997 0.70710678 0.55555556 0.67082039 0.4472136 0.34188173 0. 0.77777778 0.56980288] mean value: 0.4505375347229357 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.63157895 0.57894737 0.83333333 0.77777778 0.83333333 0.66666667 0.66666667 0.5 0.88888889 0.77777778] mean value: 0.7154970760233919 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.63157895 0.55555556 0.8 0.77777778 0.84210526 0.5 0.625 0.4 0.88888889 0.75 ] mean value: 0.6770906432748538 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.6 0.625 1. 0.77777778 0.8 1. 0.71428571 0.5 0.88888889 0.85714286] mean value: 0.7763095238095238 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.5 0.66666667 0.77777778 0.88888889 0.33333333 0.55555556 0.33333333 0.88888889 0.66666667] mean value: 0.6277777777777778 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.63333333 0.58333333 0.83333333 0.77777778 0.83333333 0.66666667 0.66666667 0.5 0.88888889 0.77777778] mean value: 0.7161111111111111 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.46153846 0.38461538 0.66666667 0.63636364 0.72727273 0.33333333 0.45454545 0.25 0.8 0.6 ] mean value: 0.5314335664335664 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.18 Accuracy on Blind test: 0.59 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.37662029 0.35531259 0.3597064 0.36402702 0.3490901 0.35136151 0.35017991 0.34702373 0.34805918 0.34904552] mean value: 0.35504262447357177 key: score_time value: [0.00975657 0.00898051 0.00918102 0.0090487 0.00944591 0.00902224 0.00896931 0.00889969 0.00917578 0.00898147] mean value: 0.0091461181640625 key: test_mcc value: [0.89893315 0.71611487 0.79772404 0.77777778 0.79772404 0.89442719 0.77777778 0.4472136 0.89442719 1. ] mean value: 0.8002119627480698 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94736842 0.84210526 0.88888889 0.88888889 0.88888889 0.94444444 0.88888889 0.72222222 0.94444444 1. ] mean value: 0.8956140350877193 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94117647 0.86956522 0.875 0.88888889 0.9 0.94117647 0.88888889 0.73684211 0.94736842 1. ] mean value: 0.8988906462661342 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.76923077 1. 0.88888889 0.81818182 1. 0.88888889 0.7 0.9 1. ] mean value: 0.8965190365190365 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88888889 1. 0.77777778 0.88888889 1. 0.88888889 0.88888889 0.77777778 1. 1. ] mean value: 0.9111111111111111 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94444444 0.83333333 0.88888889 0.88888889 0.88888889 0.94444444 0.88888889 0.72222222 0.94444444 1. ] mean value: 0.8944444444444444 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88888889 0.76923077 0.77777778 0.8 0.81818182 0.88888889 0.8 0.58333333 0.9 1. ] mean value: 0.8226301476301476 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.01750708 0.01992321 0.01931262 0.0193913 0.01945806 0.0195179 0.03629398 0.0204246 0.02689838 0.03091669] mean value: 0.02296438217163086 key: score_time value: [0.01183605 0.01174116 0.01174712 0.01317406 0.0134604 0.01328158 0.01193166 0.01465511 0.01805115 0.01541853] mean value: 0.013529682159423828 key: test_mcc value: [0.62994079 0.41773368 0.33333333 0.4472136 0.79772404 0.56980288 0.33333333 0.47140452 0.47140452 0.34188173] mean value: 0.48137724155149714 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.78947368 0.68421053 0.66666667 0.72222222 0.88888889 0.77777778 0.66666667 0.72222222 0.72222222 0.66666667] mean value: 0.7307017543859649 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.71428571 0.625 0.66666667 0.70588235 0.9 0.75 0.66666667 0.66666667 0.66666667 0.625 ] mean value: 0.6986834733893558 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.83333333 0.66666667 0.75 0.81818182 0.85714286 0.66666667 0.83333333 0.83333333 0.71428571] mean value: 0.7972943722943723 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.55555556 0.5 0.66666667 0.66666667 1. 0.66666667 0.66666667 0.55555556 0.55555556 0.55555556] mean value: 0.6388888888888888 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.77777778 0.69444444 0.66666667 0.72222222 0.88888889 0.77777778 0.66666667 0.72222222 0.72222222 0.66666667] mean value: 0.7305555555555555 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.55555556 0.45454545 0.5 0.54545455 0.81818182 0.6 0.5 0.5 0.5 0.45454545] mean value: 0.5428282828282829 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.0 Accuracy on Blind test: 0.62 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02956533 0.03639174 0.03307033 0.03308129 0.03312731 0.03301644 0.03308487 0.03315282 0.03304362 0.0330658 ] mean value: 0.03305995464324951 key: score_time value: [0.02278686 0.01986313 0.02063823 0.02110219 0.02177215 0.02229452 0.02278328 0.02294707 0.01153684 0.02025557] mean value: 0.02059798240661621 key: test_mcc value: [0.78888889 0.68543653 0.89442719 0.67082039 0.77777778 0.56980288 0.67082039 0.2236068 0.67082039 0.89442719] mean value: 0.6846828435302107 key: train_mcc value: [0.9509184 0.96326408 0.92682927 0.95121951 0.92682927 0.95121951 0.95121951 0.96348628 0.92682927 0.96348628] mean value: 0.9475301380884953 key: test_accuracy value: [0.89473684 0.84210526 0.94444444 0.83333333 0.88888889 0.77777778 0.83333333 0.61111111 0.83333333 0.94444444] mean value: 0.8403508771929824 key: train_accuracy value: [0.97546012 0.98159509 0.96341463 0.97560976 0.96341463 0.97560976 0.97560976 0.98170732 0.96341463 0.98170732] mean value: 0.9737543019601975 key: test_fscore value: [0.88888889 0.85714286 0.94117647 0.84210526 0.88888889 0.75 0.82352941 0.63157895 0.82352941 0.94736842] mean value: 0.8394208560617229 key: train_fscore value: [0.97560976 0.98159509 0.96341463 0.97560976 0.96341463 0.97560976 0.97560976 0.98181818 0.96341463 0.98159509] mean value: 0.973769129269653 key: test_precision value: [0.88888889 0.81818182 1. 0.8 0.88888889 0.85714286 0.875 0.6 0.875 0.9 ] mean value: 0.8503102453102453 key: train_precision value: [0.97560976 0.97560976 0.96341463 0.97560976 0.96341463 0.97560976 0.97560976 0.97590361 0.96341463 0.98765432] mean value: 0.9731850618372315 key: test_recall value: [0.88888889 0.9 0.88888889 0.88888889 0.88888889 0.66666667 0.77777778 0.66666667 0.77777778 1. ] mean value: 0.8344444444444444 key: train_recall value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:128: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:131: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.97560976 0.98765432 0.96341463 0.97560976 0.96341463 0.97560976 0.97560976 0.98780488 0.96341463 0.97560976] mean value: 0.9743751881963264 key: test_roc_auc value: [0.89444444 0.83888889 0.94444444 0.83333333 0.88888889 0.77777778 0.83333333 0.61111111 0.83333333 0.94444444] mean value: 0.84 key: train_roc_auc value: [0.9754592 0.98163204 0.96341463 0.97560976 0.96341463 0.97560976 0.97560976 0.98170732 0.96341463 0.98170732] mean value: 0.9737579042457091 key: test_jcc value: [0.8 0.75 0.88888889 0.72727273 0.8 0.6 0.7 0.46153846 0.7 0.9 ] mean value: 0.7327700077700078 key: train_jcc value: [0.95238095 0.96385542 0.92941176 0.95238095 0.92941176 0.95238095 0.95238095 0.96428571 0.92941176 0.96385542] mean value: 0.9489755661300665 MCC on Blind test: 0.71 Accuracy on Blind test: 0.86 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.24983573 0.25471854 0.21633697 0.21292329 0.22023106 0.21423078 0.21444273 0.33044195 0.28945279 0.34522271] mean value: 0.2547836542129517 key: score_time value: [0.03669548 0.01727962 0.02205777 0.02206111 0.02186513 0.02367711 0.02402163 0.02303267 0.02362227 0.0215323 ] mean value: 0.023584508895874025 key: test_mcc value: [0.78888889 0.68543653 0.89442719 0.67082039 0.77777778 0.56980288 0.67082039 0.2236068 0.67082039 0.55555556] mean value: 0.6507956799857747 key: train_mcc value: [0.9509184 0.96326408 0.92682927 0.95121951 0.92682927 0.95121951 0.95121951 0.96348628 0.92682927 0.97560976] mean value: 0.9487424854850787 key: test_accuracy value: [0.89473684 0.84210526 0.94444444 0.83333333 0.88888889 0.77777778 0.83333333 0.61111111 0.83333333 0.77777778] mean value: 0.8236842105263158 key: train_accuracy value: [0.97546012 0.98159509 0.96341463 0.97560976 0.96341463 0.97560976 0.97560976 0.98170732 0.96341463 0.98780488] mean value: 0.9743640580577585 key: test_fscore value: [0.88888889 0.85714286 0.94117647 0.84210526 0.88888889 0.75 0.82352941 0.63157895 0.82352941 0.77777778] mean value: 0.8224617917342376 key: train_fscore value: [0.97560976 0.98159509 0.96341463 0.97560976 0.96341463 0.97560976 0.97560976 0.98181818 0.96341463 0.98780488] mean value: 0.974390107872077 key: test_precision value: [0.88888889 0.81818182 1. 0.8 0.88888889 0.85714286 0.875 0.6 0.875 0.77777778] mean value: 0.8380880230880231 key: train_precision value: [0.97560976 0.97560976 0.96341463 0.97560976 0.96341463 0.97560976 0.97560976 0.97590361 0.96341463 0.98780488] mean value: 0.9732001175433441 key: test_recall value: [0.88888889 0.9 0.88888889 0.88888889 0.88888889 0.66666667 0.77777778 0.66666667 0.77777778 0.77777778] mean value: 0.8122222222222222 key: train_recall value: [0.97560976 0.98765432 0.96341463 0.97560976 0.96341463 0.97560976 0.97560976 0.98780488 0.96341463 0.98780488] mean value: 0.9755947003914484 key: test_roc_auc value: [0.89444444 0.83888889 0.94444444 0.83333333 0.88888889 0.77777778 0.83333333 0.61111111 0.83333333 0.77777778] mean value: 0.8233333333333333 key: train_roc_auc value: [0.9754592 0.98163204 0.96341463 0.97560976 0.96341463 0.97560976 0.97560976 0.98170732 0.96341463 0.98780488] mean value: 0.97436766034327 key: test_jcc value: [0.8 0.75 0.88888889 0.72727273 0.8 0.6 0.7 0.46153846 0.7 0.63636364] mean value: 0.7064063714063714 key: train_jcc value: [0.95238095 0.96385542 0.92941176 0.95238095 0.92941176 0.95238095 0.95238095 0.96428571 0.92941176 0.97590361] mean value: 0.9501803854071749 MCC on Blind test: 0.71 Accuracy on Blind test: 0.86 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.02767658 0.05742502 0.03703356 0.02705717 0.07875538 0.02220988 0.03057122 0.06068015 0.07214427 0.02592254] mean value: 0.043947577476501465 key: score_time value: [0.01175785 0.02437687 0.01173878 0.01178861 0.01374912 0.01184273 0.01169515 0.01199579 0.01190901 0.01173973] mean value: 0.013259363174438477 key: test_mcc value: [0.68888889 0.48934516 0.70710678 0.67082039 0.70710678 0.62017367 0.4472136 0.47140452 0.67082039 0.79772404] mean value: 0.6270604226143301 key: train_mcc value: [0.8039452 0.84056007 0.84202713 0.86643371 0.80487805 0.85391256 0.85467601 0.86643371 0.83025669 0.85391256] mean value: 0.8417035691773689 key: test_accuracy value: [0.84210526 0.73684211 0.83333333 0.83333333 0.83333333 0.77777778 0.72222222 0.72222222 0.83333333 0.88888889] mean value: 0.8023391812865497 key: train_accuracy value: [0.90184049 0.9202454 0.92073171 0.93292683 0.90243902 0.92682927 0.92682927 0.93292683 0.91463415 0.92682927] mean value: 0.9206232231033967 key: test_fscore value: [0.84210526 0.7826087 0.8 0.82352941 0.85714286 0.71428571 0.70588235 0.66666667 0.82352941 0.875 ] mean value: 0.7890750373375895 key: train_fscore value: [0.90123457 0.9202454 0.91925466 0.93167702 0.90243902 0.92592593 0.925 0.93167702 0.9125 0.92592593] mean value: 0.9195879538568511 key: test_precision value: [0.8 0.69230769 1. 0.875 0.75 1. 0.75 0.83333333 0.875 1. ] mean value: 0.8575641025641025 key: train_precision value: [0.9125 0.91463415 0.93670886 0.94936709 0.90243902 0.9375 0.94871795 0.94936709 0.93589744 0.9375 ] mean value: 0.9324631593321775 key: test_recall value: [0.88888889 0.9 0.66666667 0.77777778 1. 0.55555556 0.66666667 0.55555556 0.77777778 0.77777778] mean value: 0.7566666666666667 key: train_recall value: [0.8902439 0.92592593 0.90243902 0.91463415 0.90243902 0.91463415 0.90243902 0.91463415 0.8902439 0.91463415] mean value: 0.907226738934056 key: test_roc_auc value: [0.84444444 0.72777778 0.83333333 0.83333333 0.83333333 0.77777778 0.72222222 0.72222222 0.83333333 0.88888889] mean value: 0.8016666666666666 key: train_roc_auc value: [0.90191207 0.92028004 0.92073171 0.93292683 0.90243902 0.92682927 0.92682927 0.93292683 0.91463415 0.92682927] mean value: 0.9206338452273412 key: test_jcc value: [0.72727273 0.64285714 0.66666667 0.7 0.75 0.55555556 0.54545455 0.5 0.7 0.77777778] mean value: 0.6565584415584416 key: train_jcc value: [0.82022472 0.85227273 0.85057471 0.87209302 0.82222222 0.86206897 0.86046512 0.87209302 0.83908046 0.86206897] mean value: 0.8513163934835046 MCC on Blind test: 0.4 Accuracy on Blind test: 0.73 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.96496487 0.98876548 0.86448479 1.19553041 0.70477009 0.910182 0.83962703 0.85949993 0.71075249 1.0370965 ] mean value: 0.9075673580169678 key: score_time value: [0.01353359 0.01326942 0.01348686 0.01345611 0.01320839 0.01332855 0.01315117 0.01320934 0.01310396 0.01312208] mean value: 0.013286948204040527 key: test_mcc value: [0.78888889 0.78888889 0.70710678 0.89442719 1. 0.70710678 0.4472136 0.62017367 0.56980288 0.79772404] mean value: 0.7321332717112444 key: train_mcc value: [1. 1. 1. 1. 1. 1. 0.92710507 1. 1. 1. ] mean value: 0.9927105069301106 key: test_accuracy value: [0.89473684 0.89473684 0.83333333 0.94444444 1. 0.83333333 0.72222222 0.77777778 0.77777778 0.88888889] mean value: 0.8567251461988304 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 0.96341463 1. 1. 1. ] mean value: 0.9963414634146341 key: test_fscore value: [0.88888889 0.9 0.8 0.94117647 1. 0.8 0.70588235 0.71428571 0.75 0.875 ] mean value: 0.8375233426704015 key: train_fscore value: [1. 1. 1. 1. 1. 1. 0.96296296 1. 1. 1. ] mean value: 0.9962962962962962 key: test_precision value: [0.88888889 0.9 1. 1. 1. 1. 0.75 1. 0.85714286 1. ] mean value: 0.9396031746031746 key: train_precision value: [1. 1. 1. 1. 1. 1. 0.975 1. 1. 1. ] mean value: 0.9975 key: test_recall value: [0.88888889 0.9 0.66666667 0.88888889 1. 0.66666667 0.66666667 0.55555556 0.66666667 0.77777778] mean value: 0.7677777777777778 key: train_recall value: [1. 1. 1. 1. 1. 1. 0.95121951 1. 1. 1. ] mean value: 0.9951219512195122 key: test_roc_auc value: [0.89444444 0.89444444 0.83333333 0.94444444 1. 0.83333333 0.72222222 0.77777778 0.77777778 0.88888889] mean value: 0.8566666666666666 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 0.96341463 1. 1. 1. ] mean value: 0.9963414634146341 key: test_jcc value: [0.8 0.81818182 0.66666667 0.88888889 1. 0.66666667 0.54545455 0.55555556 0.6 0.77777778] mean value: 0.7319191919191919 key: train_jcc value: [1. 1. 1. 1. 1. 1. 0.92857143 1. 1. 1. ] mean value: 0.9928571428571429 MCC on Blind test: 0.59 Accuracy on Blind test: 0.81 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01298213 0.0095911 0.00900984 0.00858641 0.00865221 0.0084579 0.00859547 0.00881672 0.00866461 0.00866199] mean value: 0.009201836585998536 key: score_time value: [0.01462412 0.00896907 0.0085001 0.00847125 0.00839853 0.00842166 0.00838089 0.00849271 0.00842547 0.0084095 ] mean value: 0.00910933017730713 key: test_mcc value: [ 0.19096397 -0.2236068 0.26726124 0.53452248 0.26726124 0.4472136 0.23570226 -0.12403473 0.23570226 0.35355339] mean value: 0.21845389086052888 key: train_mcc value: [0.37955068 0.49121874 0.35651205 0.44106783 0.46159309 0.4083697 0.45222959 0.44501237 0.3962947 0.43158776] mean value: 0.4263436491486879 key: test_accuracy value: [0.57894737 0.47368421 0.61111111 0.72222222 0.61111111 0.66666667 0.61111111 0.44444444 0.61111111 0.61111111] mean value: 0.5941520467836258 key: train_accuracy value: [0.66871166 0.69325153 0.63414634 0.67682927 0.68292683 0.67682927 0.68902439 0.70121951 0.67682927 0.67682927] mean value: 0.6776597336525513 key: test_fscore value: [0.63636364 0.64285714 0.69565217 0.7826087 0.69565217 0.75 0.66666667 0.54545455 0.66666667 0.72 ] mean value: 0.6801921701486919 key: train_fscore value: [0.73267327 0.76415094 0.72477064 0.75117371 0.75700935 0.74146341 0.75598086 0.75376884 0.73631841 0.74881517] mean value: 0.7466124601575621 key: test_precision value: [0.53846154 0.5 0.57142857 0.64285714 0.57142857 0.6 0.58333333 0.46153846 0.58333333 0.5625 ] mean value: 0.5614880952380953 key: train_precision value: [0.61666667 0.61832061 0.58088235 0.61068702 0.61363636 0.61788618 0.62204724 0.64102564 0.62184874 0.6124031 ] mean value: 0.6155403921084903 key: test_recall value: [0.77777778 0.9 0.88888889 1. 0.88888889 1. 0.77777778 0.66666667 0.77777778 1. ] mean value: 0.8677777777777778 key: train_recall value: [0.90243902 1. 0.96341463 0.97560976 0.98780488 0.92682927 0.96341463 0.91463415 0.90243902 0.96341463] mean value: 0.95 key: test_roc_auc value: [0.58888889 0.45 0.61111111 0.72222222 0.61111111 0.66666667 0.61111111 0.44444444 0.61111111 0.61111111] mean value: 0.5927777777777778 key: train_roc_auc value: [0.66726889 0.69512195 0.63414634 0.67682927 0.68292683 0.67682927 0.68902439 0.70121951 0.67682927 0.67682927] mean value: 0.6777024992472147 key: test_jcc value: [0.46666667 0.47368421 0.53333333 0.64285714 0.53333333 0.6 0.5 0.375 0.5 0.5625 ] mean value: 0.5187374686716792 key: train_jcc value: [0.578125 0.61832061 0.56834532 0.60150376 0.60902256 0.58914729 0.60769231 0.60483871 0.58267717 0.59848485] mean value: 0.5958157568248116 MCC on Blind test: 0.27 Accuracy on Blind test: 0.68 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00887132 0.0087862 0.00888085 0.00876236 0.00883341 0.00878835 0.00874257 0.00890851 0.00874734 0.00910783] mean value: 0.008842873573303222 key: score_time value: [0.00842404 0.00844836 0.00844502 0.00852418 0.00842857 0.00843644 0.00836802 0.00834298 0.00847101 0.00843644] mean value: 0.008432507514953613 key: test_mcc value: [ 0.26666667 0.26257545 0.56980288 0.2236068 0.11396058 -0.11111111 0.34188173 0.23570226 0.2236068 0.34188173] mean value: 0.24685737827811435 key: train_mcc value: [0.44782413 0.43577775 0.43902439 0.47649639 0.48911599 0.46396698 0.50003718 0.42762497 0.45125307 0.36683699] mean value: 0.44979578382501245 key: test_accuracy value: [0.63157895 0.63157895 0.77777778 0.61111111 0.55555556 0.44444444 0.66666667 0.61111111 0.61111111 0.66666667] mean value: 0.6207602339181286 key: train_accuracy value: [0.72392638 0.71779141 0.7195122 0.73780488 0.74390244 0.73170732 0.75 0.71341463 0.72560976 0.68292683] mean value: 0.7246595840191531 key: test_fscore value: [0.63157895 0.69565217 0.75 0.58823529 0.5 0.44444444 0.625 0.53333333 0.63157895 0.7 ] mean value: 0.6099823140545311 key: train_fscore value: [0.72727273 0.7195122 0.7195122 0.72955975 0.75294118 0.73809524 0.74846626 0.70440252 0.72392638 0.67088608] mean value: 0.7234574510219577 key: test_precision value: [0.6 0.61538462 0.85714286 0.625 0.57142857 0.44444444 0.71428571 0.66666667 0.6 0.63636364] mean value: 0.6330716505716506 key: train_precision value: [0.72289157 0.71084337 0.7195122 0.75324675 0.72727273 0.72093023 0.75308642 0.72727273 0.72839506 0.69736842] mean value: 0.7260819477765448 key: test_recall value: [0.66666667 0.8 0.66666667 0.55555556 0.44444444 0.44444444 0.55555556 0.44444444 0.66666667 0.77777778] mean value: 0.6022222222222222 key: train_recall value: [0.73170732 0.72839506 0.7195122 0.70731707 0.7804878 0.75609756 0.74390244 0.68292683 0.7195122 0.64634146] mean value: 0.7216199939777176 key: test_roc_auc value: [0.63333333 0.62222222 0.77777778 0.61111111 0.55555556 0.44444444 0.66666667 0.61111111 0.61111111 0.66666667] mean value: 0.62 key: train_roc_auc value: [0.72387835 0.71785607 0.7195122 0.73780488 0.74390244 0.73170732 0.75 0.71341463 0.72560976 0.68292683] mean value: 0.7246612466124661 key: test_jcc value: [0.46153846 0.53333333 0.6 0.41666667 0.33333333 0.28571429 0.45454545 0.36363636 0.46153846 0.53846154] mean value: 0.44487678987678986 key: train_jcc value: [0.57142857 0.56190476 0.56190476 0.57425743 0.60377358 0.58490566 0.59803922 0.54368932 0.56730769 0.5047619 ] mean value: 0.5671972899407909 MCC on Blind test: 0.38 Accuracy on Blind test: 0.7 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00865221 0.0083437 0.00835395 0.00934315 0.00946736 0.00935578 0.00944567 0.00965834 0.00930619 0.00942564] mean value: 0.009135198593139649 key: score_time value: [0.01036048 0.00950432 0.00968695 0.01048803 0.01113009 0.01024699 0.01019239 0.01024103 0.01018548 0.01024628] mean value: 0.010228204727172851 key: test_mcc value: [ 0.25844328 0.28752732 -0.12403473 0.23570226 0.2236068 0. 0.11396058 0.34188173 0.11396058 0.2236068 ] mean value: 0.1674654600608012 key: train_mcc value: [0.42370843 0.42387312 0.44556639 0.43229648 0.39211447 0.47649639 0.47032008 0.46563593 0.47249649 0.4539621 ] mean value: 0.44564698943223807 key: test_accuracy value: [0.63157895 0.63157895 0.44444444 0.61111111 0.61111111 0.5 0.55555556 0.66666667 0.55555556 0.61111111] mean value: 0.5818713450292398 key: train_accuracy value: [0.71165644 0.71165644 0.7195122 0.71341463 0.69512195 0.73780488 0.73170732 0.73170732 0.73170732 0.72560976] mean value: 0.7209898249289242 key: test_fscore value: [0.58823529 0.58823529 0.28571429 0.53333333 0.58823529 0.30769231 0.5 0.625 0.5 0.63157895] mean value: 0.5148024756461289 key: train_fscore value: [0.70807453 0.70063694 0.69333333 0.68874172 0.67948718 0.72955975 0.70666667 0.71794872 0.7027027 0.70967742] mean value: 0.7036828966612066 key: test_precision value: [0.625 0.71428571 0.4 0.66666667 0.625 0.5 0.57142857 0.71428571 0.57142857 0.6 ] mean value: 0.5988095238095238 key: train_precision value: [0.72151899 0.72368421 0.76470588 0.75362319 0.71621622 0.75324675 0.77941176 0.75675676 0.78787879 0.75342466] mean value: 0.751046720496547 key: test_recall value: [0.55555556 0.5 0.22222222 0.44444444 0.55555556 0.22222222 0.44444444 0.55555556 0.44444444 0.66666667] mean value: 0.4611111111111111 key: train_recall value: [0.69512195 0.67901235 0.63414634 0.63414634 0.64634146 0.70731707 0.64634146 0.68292683 0.63414634 0.67073171] mean value: 0.6630231857874135 key: test_roc_auc value: [0.62777778 0.63888889 0.44444444 0.61111111 0.61111111 0.5 0.55555556 0.66666667 0.55555556 0.61111111] mean value: 0.5822222222222223 key: train_roc_auc value: [0.71175851 0.71145739 0.7195122 0.71341463 0.69512195 0.73780488 0.73170732 0.73170732 0.73170732 0.72560976] mean value: 0.7209801264679314 key: test_jcc value: [0.41666667 0.41666667 0.16666667 0.36363636 0.41666667 0.18181818 0.33333333 0.45454545 0.33333333 0.46153846] mean value: 0.3544871794871795 key: train_jcc value: [0.54807692 0.53921569 0.53061224 0.52525253 0.51456311 0.57425743 0.54639175 0.56 0.54166667 0.55 ] mean value: 0.5430036331284595 MCC on Blind test: -0.08 Accuracy on Blind test: 0.49 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01206779 0.01198173 0.01209545 0.01191854 0.01200819 0.0120225 0.01206994 0.01187992 0.0115304 0.01188707] mean value: 0.011946153640747071 key: score_time value: [0.00982642 0.00977206 0.00988817 0.00984406 0.00975108 0.0098269 0.00982642 0.00989437 0.00984883 0.00886917] mean value: 0.009734749794006348 key: test_mcc value: [0.15555556 0.26257545 0.70710678 0.77777778 0.26726124 0.26726124 0.55555556 0.23570226 0.55555556 0.56980288] mean value: 0.43541543059640053 key: train_mcc value: [0.71781359 0.79198683 0.74395776 0.78141806 0.74440079 0.7804878 0.75699875 0.74395776 0.74528923 0.79321396] mean value: 0.7599524542438163 key: test_accuracy value: [0.57894737 0.63157895 0.83333333 0.88888889 0.61111111 0.61111111 0.77777778 0.61111111 0.77777778 0.77777778] mean value: 0.7099415204678363 key: train_accuracy value: [0.85889571 0.89570552 0.87195122 0.8902439 0.87195122 0.8902439 0.87804878 0.87195122 0.87195122 0.89634146] mean value: 0.8797284153823134 key: test_fscore value: [0.55555556 0.69565217 0.8 0.88888889 0.69565217 0.46153846 0.77777778 0.53333333 0.77777778 0.75 ] mean value: 0.6936176142697882 key: train_fscore value: [0.86060606 0.8969697 0.87272727 0.8875 0.8742515 0.8902439 0.875 0.87272727 0.86792453 0.89820359] mean value: 0.8796153823591574 key: test_precision value: [0.55555556 0.61538462 1. 0.88888889 0.57142857 0.75 0.77777778 0.66666667 0.77777778 0.85714286] mean value: 0.7460622710622711 key: train_precision value: [0.85542169 0.88095238 0.86746988 0.91025641 0.85882353 0.8902439 0.8974359 0.86746988 0.8961039 0.88235294] mean value: 0.8806530403558976 key: test_recall value: [0.55555556 0.8 0.66666667 0.88888889 0.88888889 0.33333333 0.77777778 0.44444444 0.77777778 0.66666667] mean value: 0.6799999999999999 key: train_recall value: [0.86585366 0.91358025 0.87804878 0.86585366 0.8902439 0.8902439 0.85365854 0.87804878 0.84146341 0.91463415] mean value: 0.8791629027401385 key: test_roc_auc value: [0.57777778 0.62222222 0.83333333 0.88888889 0.61111111 0.61111111 0.77777778 0.61111111 0.77777778 0.77777778] mean value: 0.7088888888888889 key: train_roc_auc value: [0.85885276 0.89581451 0.87195122 0.8902439 0.87195122 0.8902439 0.87804878 0.87195122 0.87195122 0.89634146] mean value: 0.8797350195724178 key: test_jcc value: [0.38461538 0.53333333 0.66666667 0.8 0.53333333 0.3 0.63636364 0.36363636 0.63636364 0.6 ] mean value: 0.5454312354312354 key: train_jcc value: [0.75531915 0.81318681 0.77419355 0.79775281 0.77659574 0.8021978 0.77777778 0.77419355 0.76666667 0.81521739] mean value: 0.7853101250513387 MCC on Blind test: 0.05 Accuracy on Blind test: 0.57 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.6738658 0.89805007 0.65028882 0.70737767 0.84737277 0.68717527 0.66062117 0.86410022 0.6574862 0.7034452 ] mean value: 0.7349783182144165 key: score_time value: [0.01342988 0.0133884 0.01378322 0.01348329 0.01350474 0.01336622 0.01323557 0.01228476 0.01329565 0.01332974] mean value: 0.01331014633178711 key: test_mcc value: [0.47777778 0.4719399 0.70710678 0.89442719 0.89442719 0.47140452 0.67082039 0.56980288 0.3721042 0.70710678] mean value: 0.6236917625981757 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73684211 0.73684211 0.83333333 0.94444444 0.94444444 0.72222222 0.83333333 0.77777778 0.66666667 0.83333333] mean value: 0.8029239766081872 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.73684211 0.76190476 0.8 0.94117647 0.94736842 0.66666667 0.82352941 0.75 0.57142857 0.8 ] mean value: 0.7798916408668731 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.7 0.72727273 1. 1. 0.9 0.83333333 0.875 0.85714286 0.8 1. ] mean value: 0.8692748917748918 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.77777778 0.8 0.66666667 0.88888889 1. 0.55555556 0.77777778 0.66666667 0.44444444 0.66666667] mean value: 0.7244444444444444 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.73888889 0.73333333 0.83333333 0.94444444 0.94444444 0.72222222 0.83333333 0.77777778 0.66666667 0.83333333] mean value: 0.8027777777777777 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.58333333 0.61538462 0.66666667 0.88888889 0.9 0.5 0.7 0.6 0.4 0.66666667] mean value: 0.652094017094017 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.34 Accuracy on Blind test: 0.7 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01789188 0.01565242 0.01368737 0.01377201 0.013309 0.01342058 0.01321268 0.01274252 0.01264119 0.01333928] mean value: 0.013966894149780274 key: score_time value: [0.01292944 0.01085591 0.00963998 0.0095849 0.00976825 0.00909972 0.00912642 0.00914407 0.00917363 0.00912905] mean value: 0.009845137596130371 key: test_mcc value: [0.89893315 0.80507649 0.89442719 0.89442719 0.89442719 0.89442719 0.89442719 0.67082039 0.67082039 1. ] mean value: 0.8517786377349856 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94736842 0.89473684 0.94444444 0.94444444 0.94444444 0.94444444 0.94444444 0.83333333 0.83333333 1. ] mean value: 0.9230994152046783 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94117647 0.90909091 0.94117647 0.94117647 0.94736842 0.94117647 0.94117647 0.82352941 0.82352941 1. ] mean value: 0.9209400506614129 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.83333333 1. 1. 0.9 1. 1. 0.875 0.875 1. ] mean value: 0.9483333333333334 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88888889 1. 0.88888889 0.88888889 1. 0.88888889 0.88888889 0.77777778 0.77777778 1. ] mean value: 0.9 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94444444 0.88888889 0.94444444 0.94444444 0.94444444 0.94444444 0.94444444 0.83333333 0.83333333 1. ] mean value: 0.9222222222222222 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88888889 0.83333333 0.88888889 0.88888889 0.9 0.88888889 0.88888889 0.7 0.7 1. ] mean value: 0.8577777777777778 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.08970523 0.09563923 0.09762454 0.09816647 0.09601331 0.09740019 0.09686017 0.09789848 0.09540892 0.09602714] mean value: 0.09607436656951904 key: score_time value: [0.01693845 0.01738167 0.0185194 0.01805615 0.01852155 0.01874089 0.01831293 0.01796603 0.01841974 0.01797104] mean value: 0.018082785606384277 key: test_mcc value: [0.68888889 0.80507649 0.79772404 0.77777778 0.67082039 0.70710678 0.56980288 0.56980288 0.89442719 0.77777778] mean value: 0.7259205095594103 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.84210526 0.89473684 0.88888889 0.88888889 0.83333333 0.83333333 0.77777778 0.77777778 0.94444444 0.88888889] mean value: 0.8570175438596491 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.84210526 0.90909091 0.875 0.88888889 0.84210526 0.8 0.75 0.75 0.94736842 0.88888889] mean value: 0.8493447634237108 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8 0.83333333 1. 0.88888889 0.8 1. 0.85714286 0.85714286 0.9 0.88888889] mean value: 0.8825396825396825 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88888889 1. 0.77777778 0.88888889 0.88888889 0.66666667 0.66666667 0.66666667 1. 0.88888889] mean value: 0.8333333333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.84444444 0.88888889 0.88888889 0.88888889 0.83333333 0.83333333 0.77777778 0.77777778 0.94444444 0.88888889] mean value: 0.8566666666666666 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.72727273 0.83333333 0.77777778 0.8 0.72727273 0.66666667 0.6 0.6 0.9 0.8 ] mean value: 0.7432323232323232 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.4 Accuracy on Blind test: 0.73 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00972438 0.00980639 0.0098176 0.00968933 0.00974798 0.00983357 0.00982904 0.00982022 0.00935078 0.00986123] mean value: 0.009748053550720216 key: score_time value: [0.00927591 0.00916839 0.00913358 0.00922513 0.00917125 0.00919223 0.00910759 0.00911784 0.00923991 0.00912786] mean value: 0.009175968170166016 key: test_mcc value: [0.4719399 0.36666667 0.3721042 0.67082039 0.47140452 0.26726124 0.56980288 0.24253563 0.79772404 0.62017367] mean value: 0.48504331456099853 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.73684211 0.68421053 0.66666667 0.83333333 0.72222222 0.61111111 0.77777778 0.55555556 0.88888889 0.77777778] mean value: 0.7254385964912281 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.70588235 0.7 0.57142857 0.82352941 0.66666667 0.46153846 0.75 0.2 0.9 0.71428571] mean value: 0.6493331178625297 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 0.7 0.8 0.875 0.83333333 0.75 0.85714286 1. 0.81818182 1. ] mean value: 0.8383658008658008 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.7 0.44444444 0.77777778 0.55555556 0.33333333 0.66666667 0.11111111 1. 0.55555556] mean value: 0.5811111111111111 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.73333333 0.68333333 0.66666667 0.83333333 0.72222222 0.61111111 0.77777778 0.55555556 0.88888889 0.77777778] mean value: 0.725 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.54545455 0.53846154 0.4 0.7 0.5 0.3 0.6 0.11111111 0.81818182 0.55555556] mean value: 0.5068764568764569 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.59 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.22694707 1.18884277 1.18088603 1.20042324 1.1879015 1.21066332 1.19598603 1.18754077 1.18028498 1.21017814] mean value: 1.1969653844833374 key: score_time value: [0.09555411 0.09508872 0.09232116 0.09383941 0.09496737 0.08769631 0.09453797 0.0888617 0.09319115 0.09555507] mean value: 0.09316129684448242 key: test_mcc value: [0.89893315 0.68543653 0.79772404 0.89442719 0.77777778 0.79772404 0.79772404 0.56980288 0.79772404 0.79772404] mean value: 0.781499770395183 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94736842 0.84210526 0.88888889 0.94444444 0.88888889 0.88888889 0.88888889 0.77777778 0.88888889 0.88888889] mean value: 0.8845029239766081 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94117647 0.85714286 0.875 0.94117647 0.88888889 0.875 0.875 0.75 0.9 0.875 ] mean value: 0.8778384687208217 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.81818182 1. 1. 0.88888889 1. 1. 0.85714286 0.81818182 1. ] mean value: 0.9382395382395382 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88888889 0.9 0.77777778 0.88888889 0.88888889 0.77777778 0.77777778 0.66666667 1. 0.77777778] mean value: 0.8344444444444444 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94444444 0.83888889 0.88888889 0.94444444 0.88888889 0.88888889 0.88888889 0.77777778 0.88888889 0.88888889] mean value: 0.8838888888888888 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [0.88888889 0.75 0.77777778 0.88888889 0.8 0.77777778 0.77777778 0.6 0.81818182 0.77777778] mean value: 0.7857070707070707 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.65 Accuracy on Blind test: 0.84 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.85279465 0.89258766 0.89829969 0.88843799 0.85441613 0.87707686 0.86382127 0.86021209 0.94803047 0.88763785] mean value: 0.8823314666748047 key: score_time value: [0.19736314 0.22194862 0.22088242 0.25448847 0.24951029 0.20679188 0.24040985 0.20101738 0.25407386 0.20660353] mean value: 0.2253089427947998 key: test_mcc value: [0.78888889 0.48934516 0.70710678 0.77777778 0.67082039 0.70710678 0.89442719 0.4472136 0.56980288 0.79772404] mean value: 0.6850213490232173 key: train_mcc value: [0.96325856 0.95121218 0.92682927 0.95150257 0.96348628 0.93909422 0.96348628 0.97590007 0.93909422 0.93909422] mean value: 0.951295788364494 key: test_accuracy value: [0.89473684 0.73684211 0.83333333 0.88888889 0.83333333 0.83333333 0.94444444 0.72222222 0.77777778 0.88888889] mean value: 0.8353801169590643 key: train_accuracy value: [0.98159509 0.97546012 0.96341463 0.97560976 0.98170732 0.9695122 0.98170732 0.98780488 0.9695122 0.9695122 ] mean value: 0.9755835702528804 key: test_fscore value: [0.88888889 0.7826087 0.8 0.88888889 0.84210526 0.8 0.94117647 0.70588235 0.8 0.875 ] mean value: 0.8324550560117259 key: train_fscore value: [0.98181818 0.97560976 0.96341463 0.97590361 0.98181818 0.96969697 0.98181818 0.98795181 0.96969697 0.96969697] mean value: 0.9757425266476104 key: test_precision value: [0.88888889 0.69230769 1. 0.88888889 0.8 1. 1. 0.75 0.72727273 1. ] mean value: 0.8747358197358197 key: train_precision value: [0.97590361 0.96385542 0.96341463 0.96428571 0.97590361 0.96385542 0.97590361 0.97619048 0.96385542 0.96385542] mean value: 0.9687023354743014 key: test_recall value: [0.88888889 0.9 0.66666667 0.88888889 0.88888889 0.66666667 0.88888889 0.66666667 0.88888889 0.77777778] mean value: 0.8122222222222222 key: train_recall value: [0.98780488 0.98765432 0.96341463 0.98780488 0.98780488 0.97560976 0.98780488 1. 0.97560976 0.97560976] mean value: 0.9829117735621801 key: test_roc_auc value: [0.89444444 0.72777778 0.83333333 0.88888889 0.83333333 0.83333333 0.94444444 0.72222222 0.77777778 0.88888889] mean value: 0.8344444444444444 key: train_roc_auc value: [0.98155676 0.97553448 0.96341463 0.97560976 0.98170732 0.9695122 0.98170732 0.98780488 0.9695122 0.9695122 ] mean value: 0.975587172538392 key: test_jcc value: [0.8 0.64285714 0.66666667 0.8 0.72727273 0.66666667 0.88888889 0.54545455 0.66666667 0.77777778] mean value: 0.7182251082251082 key: train_jcc value: [0.96428571 0.95238095 0.92941176 0.95294118 0.96428571 0.94117647 0.96428571 0.97619048 0.94117647 0.94117647] mean value: 0.9527310924369747 MCC on Blind test: 0.65 Accuracy on Blind test: 0.84 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02275062 0.00911093 0.00958419 0.00923228 0.00917268 0.01035643 0.00954223 0.00929189 0.01001048 0.00930905] mean value: 0.010836076736450196 key: score_time value: [0.01018906 0.00876546 0.00997353 0.00875711 0.0086844 0.0092597 0.00946522 0.00878239 0.00908685 0.00885129] mean value: 0.009181499481201172 key: test_mcc value: [ 0.26666667 0.26257545 0.56980288 0.2236068 0.11396058 -0.11111111 0.34188173 0.23570226 0.2236068 0.34188173] mean value: 0.24685737827811435 key: train_mcc value: [0.44782413 0.43577775 0.43902439 0.47649639 0.48911599 0.46396698 0.50003718 0.42762497 0.45125307 0.36683699] mean value: 0.44979578382501245 key: test_accuracy value: [0.63157895 0.63157895 0.77777778 0.61111111 0.55555556 0.44444444 0.66666667 0.61111111 0.61111111 0.66666667] mean value: 0.6207602339181286 key: train_accuracy value: [0.72392638 0.71779141 0.7195122 0.73780488 0.74390244 0.73170732 0.75 0.71341463 0.72560976 0.68292683] mean value: 0.7246595840191531 key: test_fscore value: [0.63157895 0.69565217 0.75 0.58823529 0.5 0.44444444 0.625 0.53333333 0.63157895 0.7 ] mean value: 0.6099823140545311 key: train_fscore value: [0.72727273 0.7195122 0.7195122 0.72955975 0.75294118 0.73809524 0.74846626 0.70440252 0.72392638 0.67088608] mean value: 0.7234574510219577 key: test_precision value: [0.6 0.61538462 0.85714286 0.625 0.57142857 0.44444444 0.71428571 0.66666667 0.6 0.63636364] mean value: 0.6330716505716506 key: train_precision value: [0.72289157 0.71084337 0.7195122 0.75324675 0.72727273 0.72093023 0.75308642 0.72727273 0.72839506 0.69736842] mean value: 0.7260819477765448 key: test_recall value: [0.66666667 0.8 0.66666667 0.55555556 0.44444444 0.44444444 0.55555556 0.44444444 0.66666667 0.77777778] mean value: 0.6022222222222222 key: train_recall value: [0.73170732 0.72839506 0.7195122 0.70731707 0.7804878 0.75609756 0.74390244 0.68292683 0.7195122 0.64634146] mean value: 0.7216199939777176 key: test_roc_auc value: [0.63333333 0.62222222 0.77777778 0.61111111 0.55555556 0.44444444 0.66666667 0.61111111 0.61111111 0.66666667] mean value: 0.62 key: train_roc_auc value: [0.72387835 0.71785607 0.7195122 0.73780488 0.74390244 0.73170732 0.75 0.71341463 0.72560976 0.68292683] mean value: 0.7246612466124661 key: test_jcc value: [0.46153846 0.53333333 0.6 0.41666667 0.33333333 0.28571429 0.45454545 0.36363636 0.46153846 0.53846154] mean value: 0.44487678987678986 key: train_jcc value: [0.57142857 0.56190476 0.56190476 0.57425743 0.60377358 0.58490566 0.59803922 0.54368932 0.56730769 0.5047619 ] mean value: 0.5671972899407909 MCC on Blind test: 0.38 Accuracy on Blind test: 0.7 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.07006073 0.05620217 0.05455995 0.05443859 0.05530071 0.04911423 0.05309939 0.05028129 0.06260443 0.05637097] mean value: 0.056203246116638184 key: score_time value: [0.01024413 0.01085782 0.01115632 0.01059151 0.01034355 0.01052856 0.0102272 0.01016331 0.0101974 0.01014662] mean value: 0.010445642471313476 key: test_mcc value: [1. 0.89893315 1. 0.89442719 0.89442719 0.89442719 0.89442719 0.67082039 0.89442719 1. ] mean value: 0.9041889498200506 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.94736842 1. 0.94444444 0.94444444 0.94444444 0.94444444 0.83333333 0.94444444 1. ] mean value: 0.9502923976608187 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.95238095 1. 0.94117647 0.94736842 0.94117647 0.94117647 0.82352941 0.94736842 1. ] mean value: 0.9494176618015627 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.90909091 1. 1. 0.9 1. 1. 0.875 0.9 1. ] mean value: 0.9584090909090909 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 0.88888889 1. 0.88888889 0.88888889 0.77777778 1. 1. ] mean value: 0.9444444444444444 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.94444444 1. 0.94444444 0.94444444 0.94444444 0.94444444 0.83333333 0.94444444 1. ] mean value: 0.95 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.90909091 1. 0.88888889 0.9 0.88888889 0.88888889 0.7 0.9 1. ] mean value: 0.9075757575757576 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.83 Accuracy on Blind test: 0.92 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.01936173 0.02427292 0.02441263 0.04486084 0.04475188 0.04609394 0.03421617 0.04422712 0.02395058 0.04555655] mean value: 0.03517043590545654 key: score_time value: [0.01165771 0.01188326 0.02139616 0.0117929 0.02368331 0.01183486 0.02326417 0.01176476 0.0116334 0.02139091] mean value: 0.016030144691467286 key: test_mcc value: [0.80903983 0.36666667 0.77777778 0.56980288 0.62017367 0.79772404 0.70710678 0.47140452 0.47140452 0.70710678] mean value: 0.6298207473817191 key: train_mcc value: [0.98780488 1. 0.97590007 0.98787834 0.98787834 1. 0.98787834 0.98787834 1. 0.97560976] mean value: 0.9890828066723727 key: test_accuracy value: [0.89473684 0.68421053 0.88888889 0.77777778 0.77777778 0.88888889 0.83333333 0.72222222 0.72222222 0.83333333] mean value: 0.8023391812865497 key: train_accuracy value: [0.99386503 1. 0.98780488 0.99390244 0.99390244 1. 0.99390244 0.99390244 1. 0.98780488] mean value: 0.9945084542869969 key: test_fscore value: [0.9 0.7 0.88888889 0.75 0.71428571 0.875 0.8 0.66666667 0.66666667 0.8 ] mean value: 0.7761507936507936 key: train_fscore value: [0.99386503 1. 0.98765432 0.99386503 0.99386503 1. 0.99393939 0.99393939 1. 0.98780488] mean value: 0.9944933078939763 key: test_precision value: [0.81818182 0.7 0.88888889 0.85714286 1. 1. 1. 0.83333333 0.83333333 1. ] mean value: 0.893088023088023 key: train_precision value: [1. 1. 1. 1. 1. 1. 0.98795181 0.98795181 1. 0.98780488] mean value: 0.9963708492506612 key: test_recall value: [1. 0.7 0.88888889 0.66666667 0.55555556 0.77777778 0.66666667 0.55555556 0.55555556 0.66666667] mean value: 0.7033333333333334 key: train_recall value: [0.98780488 1. 0.97560976 0.98780488 0.98780488 1. 1. 1. 1. 0.98780488] mean value: 0.9926829268292683 key: test_roc_auc value: [0.9 0.68333333 0.88888889 0.77777778 0.77777778 0.88888889 0.83333333 0.72222222 0.72222222 0.83333333] mean value: 0.8027777777777778 key: train_roc_auc value: [0.99390244 1. 0.98780488 0.99390244 0.99390244 1. 0.99390244 0.99390244 1. 0.98780488] mean value: 0.9945121951219512 key: test_jcc value: [0.81818182 0.53846154 0.8 0.6 0.55555556 0.77777778 0.66666667 0.5 0.5 0.66666667] mean value: 0.6423310023310024 key: train_jcc value: [0.98780488 1. 0.97560976 0.98780488 0.98780488 1. 0.98795181 0.98795181 1. 0.97590361] mean value: 0.9890831619159565 MCC on Blind test: 0.27 Accuracy on Blind test: 0.62 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01200485 0.01285028 0.00930858 0.00868058 0.00853205 0.00864029 0.00857878 0.00883126 0.00857115 0.00861025] mean value: 0.009460806846618652 key: score_time value: [0.01128626 0.0095551 0.00850296 0.00831437 0.00825906 0.00825787 0.00840831 0.00830722 0.00827265 0.00831676] mean value: 0.008748054504394531 key: test_mcc value: [0.06900656 0.25844328 0.77777778 0.4472136 0.56980288 0.34188173 0.33333333 0.11111111 0.11111111 0.4472136 ] mean value: 0.34668949725554976 key: train_mcc value: [0.52587807 0.49804037 0.44112877 0.47850059 0.41512835 0.45533504 0.49147319 0.46845799 0.52757758 0.47735225] mean value: 0.4778872199982366 key: test_accuracy value: [0.52631579 0.63157895 0.88888889 0.72222222 0.77777778 0.66666667 0.66666667 0.55555556 0.55555556 0.72222222] mean value: 0.671345029239766 key: train_accuracy value: [0.7607362 0.74846626 0.7195122 0.73780488 0.70731707 0.72560976 0.74390244 0.73170732 0.76219512 0.73780488] mean value: 0.7375056112524315 key: test_fscore value: [0.57142857 0.66666667 0.88888889 0.70588235 0.8 0.625 0.66666667 0.55555556 0.55555556 0.73684211] mean value: 0.6772486362966239 key: train_fscore value: [0.77714286 0.75449102 0.73255814 0.75144509 0.71428571 0.74285714 0.75862069 0.75 0.77456647 0.74853801] mean value: 0.750450513382939 key: test_precision value: [0.5 0.63636364 0.88888889 0.75 0.72727273 0.71428571 0.66666667 0.55555556 0.55555556 0.7 ] mean value: 0.6694588744588744 key: train_precision value: [0.7311828 0.73255814 0.7 0.71428571 0.69767442 0.69892473 0.7173913 0.70212766 0.73626374 0.71910112] mean value: 0.7149509623088506 key: test_recall value: [0.66666667 0.7 0.88888889 0.66666667 0.88888889 0.55555556 0.66666667 0.55555556 0.55555556 0.77777778] mean value: 0.6922222222222222 key: train_recall value: [0.82926829 0.77777778 0.76829268 0.79268293 0.73170732 0.79268293 0.80487805 0.80487805 0.81707317 0.7804878 ] mean value: 0.7899728997289973 key: test_roc_auc value: [0.53333333 0.62777778 0.88888889 0.72222222 0.77777778 0.66666667 0.66666667 0.55555556 0.55555556 0.72222222] mean value: 0.6716666666666666 key: train_roc_auc value: [0.76031316 0.74864499 0.7195122 0.73780488 0.70731707 0.72560976 0.74390244 0.73170732 0.76219512 0.73780488] mean value: 0.7374811803673592 key: test_jcc value: [0.4 0.5 0.8 0.54545455 0.66666667 0.45454545 0.5 0.38461538 0.38461538 0.58333333] mean value: 0.5219230769230769 key: train_jcc value: [0.63551402 0.60576923 0.57798165 0.60185185 0.55555556 0.59090909 0.61111111 0.6 0.63207547 0.59813084] mean value: 0.6008898823084184 MCC on Blind test: 0.13 Accuracy on Blind test: 0.59 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01156712 0.0142765 0.01810694 0.0145216 0.0144875 0.01617956 0.01846528 0.01412177 0.03151178 0.01461315] mean value: 0.01678512096405029 key: score_time value: [0.00826836 0.01126075 0.01134539 0.01134443 0.0113399 0.01279712 0.01281691 0.02762818 0.02830529 0.0123601 ] mean value: 0.014746642112731934 key: test_mcc value: [0.89893315 0.26257545 0.53452248 0.79772404 0.89442719 0.2236068 0.47140452 0.2236068 0.26726124 0.4472136 ] mean value: 0.5021275267511051 key: train_mcc value: [0.89510866 0.90289608 0.85224163 0.82065181 0.9067647 0.89565496 0.94077493 0.83149718 0.60553007 0.64546362] mean value: 0.8296583633781469 key: test_accuracy value: [0.94736842 0.63157895 0.72222222 0.88888889 0.94444444 0.61111111 0.72222222 0.61111111 0.61111111 0.66666667] mean value: 0.735672514619883 key: train_accuracy value: [0.94478528 0.95092025 0.92073171 0.90243902 0.95121951 0.94512195 0.9695122 0.91463415 0.76829268 0.79878049] mean value: 0.9066437228789466 key: test_fscore value: [0.94117647 0.69565217 0.61538462 0.875 0.94736842 0.63157895 0.66666667 0.58823529 0.46153846 0.75 ] mean value: 0.7172601050629722 key: train_fscore value: [0.94193548 0.94936709 0.91390728 0.89189189 0.94871795 0.94797688 0.96855346 0.91764706 0.6984127 0.83076923] mean value: 0.9009179023594288 key: test_precision value: [1. 0.61538462 1. 1. 0.9 0.6 0.83333333 0.625 0.75 0.6 ] mean value: 0.7923717948717949 key: train_precision value: [1. 0.97402597 1. 1. 1. 0.9010989 1. 0.88636364 1. 0.71681416] mean value: 0.9478302670780547 key: test_recall value: [0.88888889 0.8 0.44444444 0.77777778 1. 0.66666667 0.55555556 0.55555556 0.33333333 1. ] mean value: 0.7022222222222222 key: train_recall value: [0.8902439 0.92592593 0.84146341 0.80487805 0.90243902 1. 0.93902439 0.95121951 0.53658537 0.98780488] mean value: 0.8779584462511292 key: test_roc_auc value: [0.94444444 0.62222222 0.72222222 0.88888889 0.94444444 0.61111111 0.72222222 0.61111111 0.61111111 0.66666667] mean value: 0.7344444444444445 key: train_roc_auc value: [0.94512195 0.95076784 0.92073171 0.90243902 0.95121951 0.94512195 0.9695122 0.91463415 0.76829268 0.79878049] mean value: 0.9066621499548329 key: test_jcc value: [0.88888889 0.53333333 0.44444444 0.77777778 0.9 0.46153846 0.5 0.41666667 0.3 0.6 ] mean value: 0.5822649572649573 key: train_jcc value: [0.8902439 0.90361446 0.84146341 0.80487805 0.90243902 0.9010989 0.93902439 0.84782609 0.53658537 0.71052632] mean value: 0.8277699908017685 MCC on Blind test: 0.51 Accuracy on Blind test: 0.76 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01393223 0.01368308 0.01342058 0.01372242 0.01387477 0.0146842 0.01359653 0.01407099 0.01386142 0.01323581] mean value: 0.013808202743530274 key: score_time value: [0.01012206 0.01130271 0.01128602 0.01126742 0.01138663 0.01142907 0.01143241 0.01137114 0.011307 0.01138449] mean value: 0.01122889518737793 key: test_mcc value: [0.59554321 0.48934516 0.47140452 0.70710678 0.53452248 0.62017367 0.47140452 0.56980288 0.26726124 0.70710678] mean value: 0.543367126070614 key: train_mcc value: [0.67220873 0.92666768 0.51140831 0.71034298 0.53033009 0.92932038 0.65275337 0.91470217 0.67180908 0.82951506] mean value: 0.7349057845494104 key: test_accuracy value: [0.78947368 0.73684211 0.72222222 0.83333333 0.72222222 0.77777778 0.72222222 0.77777778 0.61111111 0.83333333] mean value: 0.7526315789473684 key: train_accuracy value: [0.81595092 0.96319018 0.70731707 0.83536585 0.7195122 0.96341463 0.79878049 0.95731707 0.81097561 0.91463415] mean value: 0.848645817746521 key: test_fscore value: [0.8 0.7826087 0.76190476 0.8 0.61538462 0.71428571 0.66666667 0.75 0.46153846 0.8 ] mean value: 0.7152388915432394 key: train_fscore value: [0.84375 0.96341463 0.77358491 0.80291971 0.61016949 0.96202532 0.7480916 0.95705521 0.76691729 0.91358025] mean value: 0.834150841374106 key: test_precision value: [0.72727273 0.69230769 0.66666667 1. 1. 1. 0.83333333 0.85714286 0.75 1. ] mean value: 0.8526723276723277 key: train_precision value: [0.73636364 0.95180723 0.63076923 1. 1. 1. 1. 0.96296296 1. 0.925 ] mean value: 0.9206903059011493 key: test_recall value: [0.88888889 0.9 0.88888889 0.66666667 0.44444444 0.55555556 0.55555556 0.66666667 0.33333333 0.66666667] mean value: 0.6566666666666666 key: train_recall value: [0.98780488 0.97530864 1. 0.67073171 0.43902439 0.92682927 0.59756098 0.95121951 0.62195122 0.90243902] mean value: 0.8072869617585064 key: test_roc_auc value: [0.79444444 0.72777778 0.72222222 0.83333333 0.72222222 0.77777778 0.72222222 0.77777778 0.61111111 0.83333333] mean value: 0.7522222222222221 key: train_roc_auc value: [0.81489009 0.96326408 0.70731707 0.83536585 0.7195122 0.96341463 0.79878049 0.95731707 0.81097561 0.91463415] mean value: 0.8485471243601325 key: test_jcc value: [0.66666667 0.64285714 0.61538462 0.66666667 0.44444444 0.55555556 0.5 0.6 0.3 0.66666667] mean value: 0.5658241758241758 key: train_jcc value: [0.72972973 0.92941176 0.63076923 0.67073171 0.43902439 0.92682927 0.59756098 0.91764706 0.62195122 0.84090909] mean value: 0.7304564435913073 MCC on Blind test: 0.39 Accuracy on Blind test: 0.62 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.11691046 0.10347772 0.10455108 0.10615063 0.10520744 0.10436273 0.10814619 0.10973048 0.10878086 0.10570621] mean value: 0.1073023796081543 key: score_time value: [0.01599526 0.01573205 0.01532221 0.01451206 0.01454496 0.01522088 0.01593971 0.01650333 0.01589751 0.01483393] mean value: 0.015450191497802735 key: test_mcc value: [1. 0.68543653 1. 1. 0.89442719 0.89442719 0.89442719 0.56980288 0.89442719 0.79772404] mean value: 0.8630672208352949 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.84210526 1. 1. 0.94444444 0.94444444 0.94444444 0.77777778 0.94444444 0.88888889] mean value: 0.9286549707602338 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.85714286 1. 1. 0.94736842 0.94117647 0.94117647 0.75 0.94736842 0.875 ] mean value: 0.9259232640424591 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.81818182 1. 1. 0.9 1. 1. 0.85714286 0.9 1. ] mean value: 0.9475324675324676 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.9 1. 1. 1. 0.88888889 0.88888889 0.66666667 1. 0.77777778] mean value: 0.9122222222222223 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.83888889 1. 1. 0.94444444 0.94444444 0.94444444 0.77777778 0.94444444 0.88888889] mean value: 0.9283333333333333 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.75 1. 1. 0.9 0.88888889 0.88888889 0.6 0.9 0.77777778] mean value: 0.8705555555555555 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.03694677 0.03373933 0.05565667 0.05482864 0.0287919 0.0382278 0.03971195 0.04310226 0.05968165 0.02799916] mean value: 0.04186861515045166 key: score_time value: [0.02125978 0.01727152 0.03659678 0.01796579 0.01804519 0.01705861 0.02685332 0.026829 0.01939368 0.01746464] mean value: 0.021873831748962402 key: test_mcc value: [0.71611487 0.89893315 1. 0.89442719 0.89442719 1. 0.89442719 0.4472136 0.67082039 0.79772404] mean value: 0.8214087620957531 key: train_mcc value: [0.97575667 1. 0.98787834 1. 0.98787834 1. 0.98787834 0.98787834 0.98787834 0.98787834] mean value: 0.9903026713658697 key: test_accuracy value: [0.84210526 0.94736842 1. 0.94444444 0.94444444 1. 0.94444444 0.72222222 0.83333333 0.88888889] mean value: 0.9067251461988304 key: train_accuracy value: [0.98773006 1. 0.99390244 1. 0.99390244 1. 0.99390244 0.99390244 0.99390244 0.99390244] mean value: 0.9951144695496035 key: test_fscore value: [0.8 0.95238095 1. 0.94117647 0.94736842 1. 0.94117647 0.70588235 0.84210526 0.875 ] mean value: 0.9005089930709126 key: train_fscore value: [0.98765432 1. 0.99386503 1. 0.99386503 1. 0.99386503 0.99393939 0.99393939 0.99393939] mean value: 0.9951067594830376 key: test_precision value: [1. 0.90909091 1. 1. 0.9 1. 1. 0.75 0.8 1. ] mean value: 0.9359090909090909 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 0.98795181 0.98795181 0.98795181] mean value: 0.9963855421686747 key: test_recall value: [0.66666667 1. 1. 0.88888889 1. 1. 0.88888889 0.66666667 0.88888889 0.77777778] mean value: 0.8777777777777778 key: train_recall value: [0.97560976 1. 0.98780488 1. 0.98780488 1. 0.98780488 1. 1. 1. ] mean value: 0.9939024390243902 key: test_roc_auc value: [0.83333333 0.94444444 1. 0.94444444 0.94444444 1. 0.94444444 0.72222222 0.83333333 0.88888889] mean value: 0.9055555555555556 key: train_roc_auc value: [0.98780488 1. 0.99390244 1. 0.99390244 1. 0.99390244 0.99390244 0.99390244 0.99390244] mean value: 0.9951219512195122 key: test_jcc value: [0.66666667 0.90909091 1. 0.88888889 0.9 1. 0.88888889 0.54545455 0.72727273 0.77777778] mean value: 0.8304040404040404 key: train_jcc value: [0.97560976 1. 0.98780488 1. 0.98780488 1. 0.98780488 0.98795181 0.98795181 0.98795181] mean value: 0.9902879811930649 MCC on Blind test: 0.62 Accuracy on Blind test: 0.81 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.05209708 0.05312276 0.07166195 0.07947183 0.07822824 0.07365823 0.06713343 0.06822729 0.07329917 0.07334185] mean value: 0.0690241813659668 key: score_time value: [0.02100492 0.01554871 0.02480125 0.02019 0.02346206 0.02492499 0.02564955 0.02025509 0.02529502 0.02458215] mean value: 0.022571372985839843 key: test_mcc value: [0.36666667 0.36666667 0.4472136 0.77777778 0.56980288 0.4472136 0.56980288 0.47140452 0.77777778 0.70710678] mean value: 0.5501433146462763 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.68421053 0.68421053 0.72222222 0.88888889 0.77777778 0.66666667 0.77777778 0.72222222 0.88888889 0.83333333] mean value: 0.7646198830409356 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.7 0.70588235 0.88888889 0.75 0.5 0.75 0.66666667 0.88888889 0.8 ] mean value: 0.7316993464052287 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.66666667 0.7 0.75 0.88888889 0.85714286 1. 0.85714286 0.83333333 0.88888889 1. ] mean value: 0.8442063492063492 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.7 0.66666667 0.88888889 0.66666667 0.33333333 0.66666667 0.55555556 0.88888889 0.66666667] mean value: 0.6699999999999999 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.68333333 0.68333333 0.72222222 0.88888889 0.77777778 0.66666667 0.77777778 0.72222222 0.88888889 0.83333333] mean value: 0.7644444444444444 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.53846154 0.54545455 0.8 0.6 0.33333333 0.6 0.5 0.8 0.66666667] mean value: 0.5883916083916084 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.57 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.30187654 0.29588366 0.30465126 0.29863501 0.29659557 0.29951715 0.30715179 0.30075073 0.30479479 0.31094193] mean value: 0.30207984447479247 key: score_time value: [0.01024866 0.00926256 0.00919223 0.00947666 0.00942016 0.00918174 0.00935078 0.01026249 0.01018763 0.00962329] mean value: 0.00962061882019043 key: test_mcc value: [0.89893315 0.80507649 1. 0.89442719 0.89442719 1. 0.89442719 0.67082039 0.89442719 1. ] mean value: 0.8952538793100003 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94736842 0.89473684 1. 0.94444444 0.94444444 1. 0.94444444 0.83333333 0.94444444 1. ] mean value: 0.9453216374269006 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94117647 0.90909091 1. 0.94117647 0.94736842 1. 0.94117647 0.82352941 0.94736842 1. ] mean value: 0.9450886574725584 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.83333333 1. 1. 0.9 1. 1. 0.875 0.9 1. ] mean value: 0.9508333333333333 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88888889 1. 1. 0.88888889 1. 1. 0.88888889 0.77777778 1. 1. ] mean value: 0.9444444444444444 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94444444 0.88888889 1. 0.94444444 0.94444444 1. 0.94444444 0.83333333 0.94444444 1. ] mean value: 0.9444444444444444 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88888889 0.83333333 1. 0.88888889 0.9 1. 0.88888889 0.7 0.9 1. ] mean value: 0.9 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.02119136 0.02133584 0.02049875 0.01957178 0.02819681 0.01930714 0.01964378 0.01967049 0.01922727 0.01945806] mean value: 0.02081012725830078 key: score_time value: [0.01232815 0.01211047 0.01209617 0.01658344 0.01229763 0.01409721 0.01502109 0.01486349 0.01319528 0.01452518] mean value: 0.013711810111999512 key: test_mcc value: [0.48989795 0.45643546 0.70710678 0.53452248 0.79772404 0.79772404 0.79772404 0.70710678 0.70710678 0.89442719] mean value: 0.6889775537181078 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.68421053 0.68421053 0.83333333 0.72222222 0.88888889 0.88888889 0.88888889 0.83333333 0.83333333 0.94444444] mean value: 0.8201754385964912 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.75 0.76923077 0.85714286 0.7826087 0.9 0.9 0.9 0.85714286 0.85714286 0.94736842] mean value: 0.8520636457364146 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.6 0.625 0.75 0.64285714 0.81818182 0.81818182 0.81818182 0.75 0.75 0.9 ] mean value: 0.7472402597402598 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.7 0.66666667 0.83333333 0.72222222 0.88888889 0.88888889 0.88888889 0.83333333 0.83333333 0.94444444] mean value: 0.82 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.6 0.625 0.75 0.64285714 0.81818182 0.81818182 0.81818182 0.75 0.75 0.9 ] mean value: 0.7472402597402598 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.0 Accuracy on Blind test: 0.62 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.03742886 0.0137198 0.01522708 0.01617217 0.03871441 0.01832986 0.03243613 0.06556869 0.03291512 0.04149723] mean value: 0.031200933456420898 key: score_time value: [0.011657 0.01162052 0.01281309 0.01283717 0.0118587 0.0137105 0.02468085 0.01881003 0.02135801 0.02048373] mean value: 0.015982961654663085 key: test_mcc value: [0.78888889 0.57777778 0.79772404 0.89442719 0.77777778 0.56980288 0.34188173 0.56980288 0.56980288 0.70710678] mean value: 0.6594992828121856 key: train_mcc value: [0.96326408 0.95121218 0.96348628 0.97590007 0.93909422 0.97560976 0.92682927 0.97560976 0.96348628 0.95150257] mean value: 0.9585994467892424 key: test_accuracy value: [0.89473684 0.78947368 0.88888889 0.94444444 0.88888889 0.77777778 0.66666667 0.77777778 0.77777778 0.83333333] mean value: 0.8239766081871345 key: train_accuracy value: [0.98159509 0.97546012 0.98170732 0.98780488 0.9695122 0.98780488 0.96341463 0.98780488 0.98170732 0.97560976] mean value: 0.9792421068382463 key: test_fscore value: [0.88888889 0.8 0.875 0.94117647 0.88888889 0.75 0.625 0.75 0.75 0.8 ] mean value: 0.8068954248366014 key: train_fscore value: [0.98159509 0.97560976 0.98159509 0.98765432 0.96969697 0.98780488 0.96341463 0.98780488 0.98159509 0.97530864] mean value: 0.9792079355075015 key: test_precision value: [0.88888889 0.8 1. 1. 0.88888889 0.85714286 0.71428571 0.85714286 0.85714286 1. ] mean value: 0.8863492063492063 key: train_precision value: [0.98765432 0.96385542 0.98765432 1. 0.96385542 0.98780488 0.96341463 0.98780488 0.98765432 0.9875 ] mean value: 0.9817198196580359 key: test_recall value: [0.88888889 0.8 0.77777778 0.88888889 0.88888889 0.66666667 0.55555556 0.66666667 0.66666667 0.66666667] mean value: 0.7466666666666666 key: train_recall value: [0.97560976 0.98765432 0.97560976 0.97560976 0.97560976 0.98780488 0.96341463 0.98780488 0.97560976 0.96341463] mean value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:148: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:151: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) 0.9768142125865703 key: test_roc_auc value: [0.89444444 0.78888889 0.88888889 0.94444444 0.88888889 0.77777778 0.66666667 0.77777778 0.77777778 0.83333333] mean value: 0.8238888888888889 key: train_roc_auc value: [0.98163204 0.97553448 0.98170732 0.98780488 0.9695122 0.98780488 0.96341463 0.98780488 0.98170732 0.97560976] mean value: 0.9792532369768142 key: test_jcc value: [0.8 0.66666667 0.77777778 0.88888889 0.8 0.6 0.45454545 0.6 0.6 0.66666667] mean value: 0.6854545454545454 key: train_jcc value: [0.96385542 0.95238095 0.96385542 0.97560976 0.94117647 0.97590361 0.92941176 0.97590361 0.96385542 0.95180723] mean value: 0.9593759666664197 MCC on Blind test: 0.54 Accuracy on Blind test: 0.78 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.24665713 0.2200036 0.20462394 0.23826599 0.20800304 0.20692873 0.26327562 0.2772572 0.24964952 0.20823383] mean value: 0.23228986263275148 key: score_time value: [0.02310228 0.01178694 0.01963353 0.02278471 0.02360058 0.02142 0.02313614 0.01516008 0.02230239 0.01481676] mean value: 0.019774341583251955 key: test_mcc value: [0.78888889 0.4719399 0.79772404 0.77777778 0.77777778 0.56980288 0.34188173 0.56980288 0.47140452 0.56980288] mean value: 0.6136803280450694 key: train_mcc value: [0.96326408 0.96326408 0.96348628 0.96348628 0.93909422 0.97560976 0.92682927 0.97560976 0.97560976 0.97560976] mean value: 0.962186323547305 key: test_accuracy value: [0.89473684 0.73684211 0.88888889 0.88888889 0.88888889 0.77777778 0.66666667 0.77777778 0.72222222 0.77777778] mean value: 0.802046783625731 key: train_accuracy value: [0.98159509 0.98159509 0.98170732 0.98170732 0.9695122 0.98780488 0.96341463 0.98780488 0.98780488 0.98780488] mean value: 0.9810751159658836 key: test_fscore value: [0.88888889 0.76190476 0.875 0.88888889 0.88888889 0.75 0.625 0.75 0.66666667 0.75 ] mean value: 0.7845238095238095 key: train_fscore value: [0.98159509 0.98159509 0.98159509 0.98159509 0.96969697 0.98780488 0.96341463 0.98780488 0.98780488 0.98780488] mean value: 0.9810711484136592 key: test_precision value: [0.88888889 0.72727273 1. 0.88888889 0.88888889 0.85714286 0.71428571 0.85714286 0.83333333 0.85714286] mean value: 0.8512987012987012 key: train_precision value: [0.98765432 0.97560976 0.98765432 0.98765432 0.96385542 0.98780488 0.96341463 0.98780488 0.98780488 0.98780488] mean value: 0.9817062287088734 key: test_recall value: [0.88888889 0.8 0.77777778 0.88888889 0.88888889 0.66666667 0.55555556 0.66666667 0.55555556 0.66666667] mean value: 0.7355555555555555 key: train_recall value: [0.97560976 0.98765432 0.97560976 0.97560976 0.97560976 0.98780488 0.96341463 0.98780488 0.98780488 0.98780488] mean value: 0.9804727491719362 key: test_roc_auc value: [0.89444444 0.73333333 0.88888889 0.88888889 0.88888889 0.77777778 0.66666667 0.77777778 0.72222222 0.77777778] mean value: 0.8016666666666666 key: train_roc_auc value: [0.98163204 0.98163204 0.98170732 0.98170732 0.9695122 0.98780488 0.96341463 0.98780488 0.98780488 0.98780488] mean value: 0.9810825052694971 key: test_jcc value: [0.8 0.61538462 0.77777778 0.8 0.8 0.6 0.45454545 0.6 0.5 0.6 ] mean value: 0.6547707847707848 key: train_jcc value: [0.96385542 0.96385542 0.96385542 0.96385542 0.94117647 0.97590361 0.92941176 0.97590361 0.97590361 0.97590361] mean value: 0.9629624379872431 MCC on Blind test: 0.54 Accuracy on Blind test: 0.78 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.02531695 0.02301788 0.02503061 0.02468991 0.02239776 0.02449679 0.02545214 0.02418399 0.02235746 0.02392077] mean value: 0.024086427688598634 key: score_time value: [0.01164055 0.01155734 0.01147318 0.01160502 0.01145649 0.01149535 0.01150584 0.01157832 0.01148558 0.0115664 ] mean value: 0.011536407470703124 key: test_mcc value: [0.33333333 0. 0.66666667 0.50709255 0.46666667 0.1 0.46666667 0.69006556 0.1490712 0.1 ] mean value: 0.34795626440127836 key: train_mcc value: [0.80454045 0.82368777 0.80454045 0.77005354 0.73053854 0.78784497 0.82541478 0.78649572 0.84522516 0.80634253] mean value: 0.7984683900068174 key: test_accuracy value: [0.66666667 0.5 0.83333333 0.75 0.72727273 0.54545455 0.72727273 0.81818182 0.54545455 0.54545455] mean value: 0.6659090909090909 key: train_accuracy value: [0.90196078 0.91176471 0.90196078 0.88235294 0.86407767 0.89320388 0.91262136 0.89320388 0.9223301 0.90291262] mean value: 0.8986388730249382 key: test_fscore value: [0.66666667 0.57142857 0.83333333 0.72727273 0.72727273 0.54545455 0.72727273 0.8 0.44444444 0.54545455] mean value: 0.6588600288600288 key: train_fscore value: [0.9 0.91089109 0.9 0.875 0.86 0.89108911 0.91262136 0.89108911 0.92 0.9 ] mean value: 0.8960690666153994 key: test_precision value: [0.66666667 0.5 0.83333333 0.8 0.66666667 0.5 0.66666667 1. 0.66666667 0.6 ] mean value: 0.69 key: train_precision value: [0.91836735 0.92 0.91836735 0.93333333 0.89583333 0.91836735 0.92156863 0.9 0.93877551 0.91836735] mean value: 0.9182980192076831 key: test_recall value: [0.66666667 0.66666667 0.83333333 0.66666667 0.8 0.6 0.8 0.66666667 0.33333333 0.5 ] mean value: 0.6533333333333333 key: train_recall value: [0.88235294 0.90196078 0.88235294 0.82352941 0.82692308 0.86538462 0.90384615 0.88235294 0.90196078 0.88235294] mean value: 0.8753016591251885 key: test_roc_auc value: [0.66666667 0.5 0.83333333 0.75 0.73333333 0.55 0.73333333 0.83333333 0.56666667 0.55 ] mean value: 0.6716666666666666 key: train_roc_auc value: [0.90196078 0.91176471 0.90196078 0.88235294 0.86444193 0.89347662 0.91270739 0.89309955 0.92213424 0.90271493] mean value: 0.8986613876319759 key: test_jcc value: [0.5 0.4 0.71428571 0.57142857 0.57142857 0.375 0.57142857 0.66666667 0.28571429 0.375 ] mean value: 0.503095238095238 key: train_jcc value: [0.81818182 0.83636364 0.81818182 0.77777778 0.75438596 0.80357143 0.83928571 0.80357143 0.85185185 0.81818182] mean value: 0.8121353256879573 MCC on Blind test: 0.38 Accuracy on Blind test: 0.7 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.57380795 0.74630857 0.60518217 0.58629417 0.72743988 0.58024836 0.6250155 0.59622169 0.65928268 0.60975552] mean value: 0.6309556484222412 key: score_time value: [0.01315403 0.01296639 0.01179981 0.01184464 0.01358771 0.0130477 0.01297903 0.01181436 0.01576495 0.01186824] mean value: 0.012882685661315918 key: test_mcc value: [ 0.33333333 0.4472136 0. 0.50709255 0.06900656 0.46666667 -0.06900656 0.46666667 0.55901699 0.26666667] mean value: 0.30466564760453485 key: train_mcc value: [1. 1. 0.47140452 0.61209384 1. 1. 1. 0.67006033 1. 0.94190878] mean value: 0.8695467473673132 key: test_accuracy value: [0.66666667 0.66666667 0.5 0.75 0.54545455 0.72727273 0.45454545 0.72727273 0.72727273 0.63636364] mean value: 0.6401515151515151 key: train_accuracy value: [1. 1. 0.73529412 0.80392157 1. 1. 1. 0.83495146 1. 0.97087379] mean value: 0.9345040928992956 key: test_fscore value: [0.66666667 0.75 0.5 0.72727273 0.44444444 0.72727273 0.5 0.72727273 0.66666667 0.66666667] mean value: 0.6376262626262625 key: train_fscore value: [1. 1. 0.72727273 0.79166667 1. 1. 1. 0.83495146 1. 0.97029703] mean value: 0.9324187879953043 key: test_precision value: [0.66666667 0.6 0.5 0.8 0.5 0.66666667 0.42857143 0.8 1. 0.66666667] mean value: 0.6628571428571428 key: train_precision value: [1. 1. 0.75 0.84444444 1. 1. 1. 0.82692308 1. 0.98 ] mean value: 0.9401367521367521 key: test_recall value: [0.66666667 1. 0.5 0.66666667 0.4 0.8 0.6 0.66666667 0.5 0.66666667] mean value: 0.6466666666666666 key: train_recall value: [1. 1. 0.70588235 0.74509804 1. 1. 1. 0.84313725 1. 0.96078431] mean value: 0.9254901960784314 key: test_roc_auc value: [0.66666667 0.66666667 0.5 0.75 0.53333333 0.73333333 0.46666667 0.73333333 0.75 0.63333333] mean value: 0.6433333333333333 key: train_roc_auc value: [1. 1. 0.73529412 0.80392157 1. 1. 1. 0.83503017 1. 0.97077677] mean value: 0.9345022624434389 key: test_jcc value: [0.5 0.6 0.33333333 0.57142857 0.28571429 0.57142857 0.33333333 0.57142857 0.5 0.5 ] mean value: 0.4766666666666666 key: train_jcc value: [1. 1. 0.57142857 0.65517241 1. 1. 1. 0.71666667 1. 0.94230769] mean value: 0.8885575344196034 MCC on Blind test: 0.6 Accuracy on Blind test: 0.81 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01224828 0.01091671 0.00891137 0.00843143 0.00929499 0.00856352 0.00876927 0.00851512 0.00861287 0.0084939 ] mean value: 0.00927574634552002 key: score_time value: [0.011657 0.00899959 0.00917506 0.00841236 0.00840473 0.00845885 0.00845647 0.00876474 0.00840163 0.00846171] mean value: 0.008919215202331543 key: test_mcc value: [0.35355339 0. 0.4472136 0.30151134 0.43033148 0.55901699 0.28867513 0. 0.2608746 0. ] mean value: 0.26411765399276665 key: train_mcc value: [0.49265379 0.4152274 0.4564139 0.43133109 0.40048439 0.3666794 0.40048439 0.42470149 0.37638633 0.45573272] mean value: 0.4220094905915334 key: test_accuracy value: [0.66666667 0.5 0.66666667 0.58333333 0.63636364 0.72727273 0.54545455 0.54545455 0.63636364 0.54545455] mean value: 0.6053030303030302 key: train_accuracy value: [0.70588235 0.64705882 0.70588235 0.65686275 0.6407767 0.62135922 0.6407767 0.65048544 0.62135922 0.66990291] mean value: 0.6560346468684561 key: test_fscore value: [0.71428571 0.66666667 0.75 0.70588235 0.71428571 0.76923077 0.66666667 0.70588235 0.71428571 0.70588235] mean value: 0.7113068304244774 key: train_fscore value: [0.76923077 0.73913043 0.75806452 0.74452555 0.73758865 0.72727273 0.73758865 0.73913043 0.72340426 0.75 ] mean value: 0.742593598992669 key: test_precision value: [0.625 0.5 0.6 0.54545455 0.55555556 0.625 0.5 0.54545455 0.625 0.54545455] mean value: 0.5666919191919192 key: train_precision value: [0.63291139 0.5862069 0.64383562 0.59302326 0.58426966 0.57142857 0.58426966 0.5862069 0.56666667 0.6 ] mean value: 0.5948818621698756 key: test_recall value: [0.83333333 1. 1. 1. 1. 1. 1. 1. 0.83333333 1. ] mean value: 0.9666666666666667 key: train_recall value: [0.98039216 1. 0.92156863 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9901960784313726 key: test_roc_auc value: [0.66666667 0.5 0.66666667 0.58333333 0.66666667 0.75 0.58333333 0.5 0.61666667 0.5 ] mean value: 0.6033333333333334 key: train_roc_auc value: [0.70588235 0.64705882 0.70588235 0.65686275 0.6372549 0.61764706 0.6372549 0.65384615 0.625 0.67307692] mean value: 0.6559766214177979 key: test_jcc value: [0.55555556 0.5 0.6 0.54545455 0.55555556 0.625 0.5 0.54545455 0.55555556 0.54545455] mean value: 0.5528030303030302 key: train_jcc value: [0.625 0.5862069 0.61038961 0.59302326 0.58426966 0.57142857 0.58426966 0.5862069 0.56666667 0.6 ] mean value: 0.5907461223244946 MCC on Blind test: 0.27 Accuracy on Blind test: 0.68 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00996208 0.00888777 0.00997472 0.00940871 0.00896931 0.00880861 0.00899887 0.00877929 0.00869918 0.0094018 ] mean value: 0.009189033508300781 key: score_time value: [0.00949287 0.00865507 0.00963235 0.00957608 0.00890374 0.00917196 0.00849485 0.0089581 0.0091083 0.00884199] mean value: 0.00908353328704834 key: test_mcc value: [ 0. -0.19245009 0.19245009 0.35355339 0.1 0.1 -0.1 -0.06900656 -0.04303315 0.26666667] mean value: 0.06081803530345115 key: train_mcc value: [0.47809144 0.49362406 0.41692608 0.44177063 0.44167123 0.49697785 0.45999986 0.55337612 0.46410101 0.42167602] mean value: 0.4668214305743355 key: test_accuracy value: [0.5 0.41666667 0.58333333 0.66666667 0.54545455 0.54545455 0.45454545 0.45454545 0.45454545 0.63636364] mean value: 0.5257575757575758 key: train_accuracy value: [0.73529412 0.74509804 0.70588235 0.71568627 0.7184466 0.74757282 0.72815534 0.77669903 0.72815534 0.70873786] mean value: 0.7309727774604988 key: test_fscore value: [0.4 0.22222222 0.44444444 0.6 0.54545455 0.54545455 0.4 0.4 0.25 0.66666667] mean value: 0.4474242424242424 key: train_fscore value: [0.70967742 0.72916667 0.68085106 0.68131868 0.70103093 0.74 0.71428571 0.77227723 0.69565217 0.68085106] mean value: 0.7105110938756343 key: test_precision value: [0.5 0.33333333 0.66666667 0.75 0.5 0.5 0.4 0.5 0.5 0.66666667] mean value: 0.5316666666666666 key: train_precision value: [0.78571429 0.77777778 0.74418605 0.775 0.75555556 0.77083333 0.76086957 0.78 0.7804878 0.74418605] mean value: 0.7674610415499649 key: test_recall value: [0.33333333 0.16666667 0.33333333 0.5 0.6 0.6 0.4 0.33333333 0.16666667 0.66666667] mean value: 0.41 key: train_recall value: [0.64705882 0.68627451 0.62745098 0.60784314 0.65384615 0.71153846 0.67307692 0.76470588 0.62745098 0.62745098] mean value: 0.6626696832579185 key: test_roc_auc value: [0.5 0.41666667 0.58333333 0.66666667 0.55 0.55 0.45 0.46666667 0.48333333 0.63333333] mean value: 0.53 key: train_roc_auc value: [0.73529412 0.74509804 0.70588235 0.71568627 0.71907994 0.74792609 0.72869532 0.77658371 0.72718703 0.70795626] mean value: 0.7309389140271493 key: test_jcc value: [0.25 0.125 0.28571429 0.42857143 0.375 0.375 0.25 0.25 0.14285714 0.5 ] mean value: 0.2982142857142857 key: train_jcc value: [0.55 0.57377049 0.51612903 0.51666667 0.53968254 0.58730159 0.55555556 0.62903226 0.53333333 0.51612903] mean value: 0.5517600496923607 MCC on Blind test: 0.27 Accuracy on Blind test: 0.62 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.0098145 0.00907564 0.00900006 0.00922799 0.00901246 0.00870895 0.00914049 0.00910187 0.00882554 0.0091033 ] mean value: 0.009101080894470214 key: score_time value: [0.01085329 0.01000524 0.00981808 0.00992918 0.00976562 0.00992513 0.00990963 0.01000428 0.00987315 0.00958896] mean value: 0.00996725559234619 key: test_mcc value: [ 0. -0.70710678 -0.50709255 0.33333333 0.3105295 -0.46666667 0.06900656 0.1 0.3105295 -0.1490712 ] mean value: -0.07065383065146226 key: train_mcc value: [0.30261377 0.45663332 0.34296227 0.39223227 0.35084901 0.43681633 0.32422165 0.44600762 0.40045707 0.39833814] mean value: 0.38511314468719693 key: test_accuracy value: [0.5 0.16666667 0.25 0.66666667 0.63636364 0.27272727 0.54545455 0.54545455 0.63636364 0.45454545] mean value: 0.4674242424242424 key: train_accuracy value: [0.64705882 0.7254902 0.66666667 0.69607843 0.66990291 0.7184466 0.66019417 0.7184466 0.69902913 0.69902913] mean value: 0.6900342661336379 key: test_fscore value: [0.5 0. 0.30769231 0.66666667 0.66666667 0.2 0.44444444 0.54545455 0.6 0.57142857] mean value: 0.45023532023532026 key: train_fscore value: [0.6 0.70212766 0.62222222 0.69306931 0.63043478 0.72380952 0.63917526 0.68131868 0.71028037 0.68686869] mean value: 0.6689306494896705 key: test_precision value: [0.5 0. 0.28571429 0.66666667 0.57142857 0.2 0.5 0.6 0.75 0.5 ] mean value: 0.4573809523809524 key: train_precision value: [0.69230769 0.76744186 0.71794872 0.7 0.725 0.71698113 0.68888889 0.775 0.67857143 0.70833333] mean value: 0.7170473053590649 key: test_recall value: [0.5 0. 0.33333333 0.66666667 0.8 0.2 0.4 0.5 0.5 0.66666667] mean value: 0.45666666666666667 key: train_recall value: [0.52941176 0.64705882 0.54901961 0.68627451 0.55769231 0.73076923 0.59615385 0.60784314 0.74509804 0.66666667] mean value: 0.6315987933634992 key: test_roc_auc value: [0.5 0.16666667 0.25 0.66666667 0.65 0.26666667 0.53333333 0.55 0.65 0.43333333] mean value: 0.4666666666666667 key: train_roc_auc value: [0.64705882 0.7254902 0.66666667 0.69607843 0.67100302 0.71832579 0.66082202 0.71738311 0.6994721 0.69871795] mean value: 0.6901018099547511 key: test_jcc value: [0.33333333 0. 0.18181818 0.5 0.5 0.11111111 0.28571429 0.375 0.42857143 0.4 ] mean value: 0.31155483405483403 key: train_jcc value: [0.42857143 0.54098361 0.4516129 0.53030303 0.46031746 0.56716418 0.46969697 0.51666667 0.55072464 0.52307692] mean value: 0.50391178052013 MCC on Blind test: 0.18 Accuracy on Blind test: 0.59 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01031351 0.01002502 0.00932026 0.00934005 0.01001763 0.00933385 0.00924444 0.00923753 0.00987935 0.00954986] mean value: 0.009626150131225586 key: score_time value: [0.01006293 0.00948262 0.00882435 0.00858355 0.00855827 0.00859284 0.00859046 0.00890946 0.00868011 0.00940084] mean value: 0.008968544006347657 key: test_mcc value: [ 0.35355339 -0.16903085 0. 0.33333333 0.3105295 0.06900656 0.46666667 0.26666667 0.1490712 0.1 ] mean value: 0.1879796462452518 key: train_mcc value: [0.67303645 0.7856742 0.60972137 0.65158377 0.61317623 0.72878164 0.74896235 0.76763491 0.70975239 0.70878919] mean value: 0.6997112502455441 key: test_accuracy value: [0.66666667 0.41666667 0.5 0.66666667 0.63636364 0.54545455 0.72727273 0.63636364 0.54545455 0.54545455] mean value: 0.5886363636363636 key: train_accuracy value: [0.83333333 0.89215686 0.80392157 0.82352941 0.80582524 0.86407767 0.87378641 0.88349515 0.85436893 0.85436893] mean value: 0.8488863506567675 key: test_fscore value: [0.6 0.46153846 0.5 0.66666667 0.66666667 0.44444444 0.72727273 0.66666667 0.44444444 0.54545455] mean value: 0.5723154623154623 key: train_fscore value: [0.82105263 0.88888889 0.79591837 0.8125 0.81481481 0.8627451 0.87128713 0.88461538 0.84848485 0.85148515] mean value: 0.8451792310996761 key: test_precision value: [0.75 0.42857143 0.5 0.66666667 0.57142857 0.5 0.66666667 0.66666667 0.66666667 0.6 ] mean value: 0.6016666666666667 key: train_precision value: [0.88636364 0.91666667 0.82978723 0.86666667 0.78571429 0.88 0.89795918 0.86792453 0.875 0.86 ] mean value: 0.8666082201429165 key: test_recall value: [0.5 0.5 0.5 0.66666667 0.8 0.4 0.8 0.66666667 0.33333333 0.5 ] mean value: 0.5666666666666667 key: train_recall value: [0.76470588 0.8627451 0.76470588 0.76470588 0.84615385 0.84615385 0.84615385 0.90196078 0.82352941 0.84313725] mean value: 0.8263951734539969 key: test_roc_auc value: [0.66666667 0.41666667 0.5 0.66666667 0.65 0.53333333 0.73333333 0.63333333 0.56666667 0.55 ] mean value: 0.5916666666666667 key: train_roc_auc value: [0.83333333 0.89215686 0.80392157 0.82352941 0.80542986 0.86425339 0.87405732 0.8836727 0.8540724 0.85426094] mean value: 0.848868778280543 key: test_jcc value: [0.42857143 0.3 0.33333333 0.5 0.5 0.28571429 0.57142857 0.5 0.28571429 0.375 ] mean value: 0.4079761904761905 key: train_jcc value: [0.69642857 0.8 0.66101695 0.68421053 0.6875 0.75862069 0.77192982 0.79310345 0.73684211 0.74137931] mean value: 0.7331031424997326 MCC on Blind test: 0.29 Accuracy on Blind test: 0.65 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.52144575 0.47171903 0.82721043 0.55670023 0.54132795 0.57578349 0.56852198 0.55415893 0.50108457 0.61252236] mean value: 0.5730474710464477 key: score_time value: [0.0121913 0.0121181 0.01214528 0.01250863 0.01261091 0.01247096 0.01241922 0.01233172 0.01238108 0.01257896] mean value: 0.012375617027282714 key: test_mcc value: [0.33333333 0. 0. 0.66666667 0.83333333 0.2608746 0.1490712 0.44854261 0.1490712 0.46666667] mean value: 0.3307559607947478 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.66666667 0.5 0.5 0.83333333 0.90909091 0.63636364 0.54545455 0.72727273 0.54545455 0.72727273] mean value: 0.6590909090909091 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.57142857 0.57142857 0.83333333 0.90909091 0.5 0.61538462 0.76923077 0.44444444 0.72727273] mean value: 0.6608280608280608 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.66666667 0.5 0.5 0.83333333 0.83333333 0.66666667 0.5 0.71428571 0.66666667 0.8 ] mean value: 0.6680952380952381 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.66666667 0.66666667 0.83333333 1. 0.4 0.8 0.83333333 0.33333333 0.66666667] mean value: 0.6866666666666666 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.66666667 0.5 0.5 0.83333333 0.91666667 0.61666667 0.56666667 0.71666667 0.56666667 0.73333333] mean value: 0.6616666666666667 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.4 0.4 0.71428571 0.83333333 0.33333333 0.44444444 0.625 0.28571429 0.57142857] mean value: 0.5107539682539682 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.22 Accuracy on Blind test: 0.62 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01543951 0.01430631 0.01153779 0.01077008 0.01102376 0.01163149 0.01030493 0.01114631 0.01093674 0.01104331] mean value: 0.011814022064208984 key: score_time value: [0.01190424 0.00897908 0.00885177 0.00861216 0.0084064 0.00839639 0.00842237 0.00837159 0.0084188 0.00844646] mean value: 0.008880925178527833 key: test_mcc value: [0.50709255 0.70710678 0.84515425 0.66666667 0.83333333 0.46666667 0.44854261 0.63333333 0.69006556 0.46666667] mean value: 0.6264628428333725 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.75 0.83333333 0.91666667 0.83333333 0.90909091 0.72727273 0.72727273 0.81818182 0.81818182 0.72727273] mean value: 0.806060606060606 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.76923077 0.85714286 0.90909091 0.83333333 0.90909091 0.72727273 0.66666667 0.83333333 0.8 0.72727273] mean value: 0.8032434232434232 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.71428571 0.75 1. 0.83333333 0.83333333 0.66666667 0.75 0.83333333 1. 0.8 ] mean value: 0.8180952380952381 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.83333333 1. 0.83333333 0.83333333 1. 0.8 0.6 0.83333333 0.66666667 0.66666667] mean value: 0.8066666666666666 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.75 0.83333333 0.91666667 0.83333333 0.91666667 0.73333333 0.71666667 0.81666667 0.83333333 0.73333333] mean value: 0.8083333333333333 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.625 0.75 0.83333333 0.71428571 0.83333333 0.57142857 0.5 0.71428571 0.66666667 0.57142857] mean value: 0.6779761904761905 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.65 Accuracy on Blind test: 0.81 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.0848 0.08455276 0.08480477 0.08505201 0.08489251 0.08458662 0.0842483 0.08415937 0.08451748 0.08450079] mean value: 0.08461146354675293 key: score_time value: [0.01697922 0.01714063 0.01709533 0.01729393 0.01706409 0.0169909 0.01697874 0.01725078 0.01703787 0.01676655] mean value: 0.017059803009033203 key: test_mcc value: [0.57735027 0. 0.19245009 0.66666667 0.55901699 0.44854261 0.44854261 0.06900656 0.1490712 0.44854261] mean value: 0.3559189615112927 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.75 0.5 0.58333333 0.83333333 0.72727273 0.72727273 0.72727273 0.54545455 0.54545455 0.72727273] mean value: 0.6666666666666666 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.57142857 0.66666667 0.83333333 0.76923077 0.66666667 0.66666667 0.61538462 0.44444444 0.76923077] mean value: 0.6669719169719169 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.5 0.55555556 0.83333333 0.625 0.75 0.75 0.57142857 0.66666667 0.71428571] mean value: 0.6966269841269841 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.5 0.66666667 0.83333333 0.83333333 1. 0.6 0.6 0.66666667 0.33333333 0.83333333] mean value: 0.6866666666666666 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.75 0.5 0.58333333 0.83333333 0.75 0.71666667 0.71666667 0.53333333 0.56666667 0.71666667] mean value: 0.6666666666666667 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.4 0.5 0.71428571 0.625 0.5 0.5 0.44444444 0.28571429 0.625 ] mean value: 0.5094444444444445 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.29 Accuracy on Blind test: 0.65 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00962925 0.00877833 0.00890136 0.00889707 0.00978112 0.00891805 0.00888753 0.00978589 0.00883222 0.00868726] mean value: 0.009109807014465333 key: score_time value: [0.00888324 0.00857091 0.00868154 0.00859356 0.0087707 0.00869417 0.00870609 0.00908661 0.00851727 0.00869942] mean value: 0.00872035026550293 key: test_mcc value: [ 0.33333333 -0.16903085 0.16903085 0.33333333 0.26666667 0.51639778 0.1 -0.06900656 -0.2608746 -0.06900656] mean value: 0.11508434035842094 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.66666667 0.41666667 0.58333333 0.66666667 0.63636364 0.72727273 0.54545455 0.45454545 0.36363636 0.45454545] mean value: 0.5515151515151515 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.46153846 0.54545455 0.66666667 0.6 0.57142857 0.54545455 0.4 0.22222222 0.4 ] mean value: 0.507943167943168 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.66666667 0.42857143 0.6 0.66666667 0.6 1. 0.5 0.5 0.33333333 0.5 ] mean value: 0.5795238095238096 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.5 0.5 0.66666667 0.6 0.4 0.6 0.33333333 0.16666667 0.33333333] mean value: 0.4766666666666666 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.66666667 0.41666667 0.58333333 0.66666667 0.63333333 0.7 0.55 0.46666667 0.38333333 0.46666667] mean value: 0.5533333333333333 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.3 0.375 0.5 0.42857143 0.4 0.375 0.25 0.125 0.25 ] mean value: 0.35035714285714287 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.04 Accuracy on Blind test: 0.54 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.07809305 1.08052611 1.08644509 1.07913208 1.0745616 1.07508063 1.06093812 1.06822157 1.06732559 1.06763673] mean value: 1.0737960577011108 key: score_time value: [0.09412026 0.09304404 0.09404373 0.09632349 0.09305692 0.08730507 0.08752012 0.09352827 0.09296465 0.09239817] mean value: 0.0924304723739624 key: test_mcc value: [0.50709255 0.33333333 0.57735027 0.66666667 0.83333333 0.46666667 0.44854261 0.63333333 0.43033148 0.44854261] mean value: 0.5345192865417064 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.75 0.66666667 0.75 0.83333333 0.90909091 0.72727273 0.72727273 0.81818182 0.63636364 0.72727273] mean value: 0.7545454545454545 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.72727273 0.66666667 0.8 0.83333333 0.90909091 0.72727273 0.66666667 0.83333333 0.5 0.76923077] mean value: 0.7432867132867133 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8 0.66666667 0.66666667 0.83333333 0.83333333 0.66666667 0.75 0.83333333 1. 0.71428571] mean value: 0.7764285714285715 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.66666667 1. 0.83333333 1. 0.8 0.6 0.83333333 0.33333333 0.83333333] mean value: 0.7566666666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.75 0.66666667 0.75 0.83333333 0.91666667 0.73333333 0.71666667 0.81666667 0.66666667 0.71666667] mean value: 0.7566666666666667 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [0.57142857 0.5 0.66666667 0.71428571 0.83333333 0.57142857 0.5 0.71428571 0.33333333 0.625 ] mean value: 0.6029761904761904 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.56 Accuracy on Blind test: 0.78 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.83386564 0.82139683 0.89413834 0.88040566 0.8696816 0.87154555 0.89003706 0.92764449 0.85521364 0.85460138] mean value: 0.8698530197143555 key: score_time value: [0.18552732 0.22606993 0.197824 0.18685389 0.24319506 0.16305614 0.231493 0.23278856 0.19366717 0.18959689] mean value: 0.20500719547271729 key: test_mcc value: [0.50709255 0.50709255 0.57735027 0.50709255 0.63333333 0.46666667 0.63333333 0.83333333 0.1490712 0.44854261] mean value: 0.5262908406440139 key: train_mcc value: [0.92156863 0.92156863 0.92156863 0.96152395 0.94193062 0.9229904 0.94190878 0.92304797 0.92232278 0.88419471] mean value: 0.9262625082295393 key: test_accuracy value: [0.75 0.75 0.75 0.75 0.81818182 0.72727273 0.81818182 0.90909091 0.54545455 0.72727273] mean value: 0.7545454545454545 key: train_accuracy value: [0.96078431 0.96078431 0.96078431 0.98039216 0.97087379 0.96116505 0.97087379 0.96116505 0.96116505 0.94174757] mean value: 0.9629735389301352 key: test_fscore value: [0.72727273 0.76923077 0.8 0.72727273 0.8 0.72727273 0.8 0.90909091 0.44444444 0.76923077] mean value: 0.7473815073815073 key: train_fscore value: [0.96078431 0.96078431 0.96078431 0.98 0.97087379 0.96226415 0.97142857 0.96153846 0.96078431 0.94230769] mean value: 0.963154991752785 key: test_precision value: [0.8 0.71428571 0.66666667 0.8 0.8 0.66666667 0.8 1. 0.66666667 0.71428571] mean value: 0.7628571428571429 key: train_precision value: [0.96078431 0.96078431 0.96078431 1. 0.98039216 0.94444444 0.96226415 0.94339623 0.96078431 0.9245283 ] mean value: 0.9598162535454433 key: test_recall value: [0.66666667 0.83333333 1. 0.66666667 0.8 0.8 0.8 0.83333333 0.33333333 0.83333333] mean value: 0.7566666666666667 key: train_recall value: [0.96078431 0.96078431 0.96078431 0.96078431 0.96153846 0.98076923 0.98076923 0.98039216 0.96078431 0.96078431] mean value: 0.9668174962292609 key: test_roc_auc value: [0.75 0.75 0.75 0.75 0.81666667 0.73333333 0.81666667 0.91666667 0.56666667 0.71666667] mean value: 0.7566666666666667 key: train_roc_auc value: [0.96078431 0.96078431 0.96078431 0.98039216 0.97096531 0.96097285 0.97077677 0.96134992 0.96116139 0.94193062] mean value: 0.9629901960784315 key: test_jcc value: [0.57142857 0.625 0.66666667 0.57142857 0.66666667 0.57142857 0.66666667 0.83333333 0.28571429 0.625 ] mean value: 0.6083333333333333 key: train_jcc value: [0.9245283 0.9245283 0.9245283 0.96078431 0.94339623 0.92727273 0.94444444 0.92592593 0.9245283 0.89090909] mean value: 0.9290845936239943 MCC on Blind test: 0.6 Accuracy on Blind test: 0.81 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0218575 0.00876045 0.0088129 0.00888467 0.00886083 0.00874114 0.00883603 0.0088737 0.00875211 0.0096333 ] mean value: 0.010201263427734374 key: score_time value: [0.01380324 0.00858402 0.00932169 0.00855565 0.0086031 0.00860167 0.00856328 0.00865817 0.00851178 0.00928473] mean value: 0.009248733520507812 key: test_mcc value: [ 0. -0.19245009 0.19245009 0.35355339 0.1 0.1 -0.1 -0.06900656 -0.04303315 0.26666667] mean value: 0.06081803530345115 key: train_mcc value: [0.47809144 0.49362406 0.41692608 0.44177063 0.44167123 0.49697785 0.45999986 0.55337612 0.46410101 0.42167602] mean value: 0.4668214305743355 key: test_accuracy value: [0.5 0.41666667 0.58333333 0.66666667 0.54545455 0.54545455 0.45454545 0.45454545 0.45454545 0.63636364] mean value: 0.5257575757575758 key: train_accuracy value: [0.73529412 0.74509804 0.70588235 0.71568627 0.7184466 0.74757282 0.72815534 0.77669903 0.72815534 0.70873786] mean value: 0.7309727774604988 key: test_fscore value: [0.4 0.22222222 0.44444444 0.6 0.54545455 0.54545455 0.4 0.4 0.25 0.66666667] mean value: 0.4474242424242424 key: train_fscore value: [0.70967742 0.72916667 0.68085106 0.68131868 0.70103093 0.74 0.71428571 0.77227723 0.69565217 0.68085106] mean value: 0.7105110938756343 key: test_precision value: [0.5 0.33333333 0.66666667 0.75 0.5 0.5 0.4 0.5 0.5 0.66666667] mean value: 0.5316666666666666 key: train_precision value: [0.78571429 0.77777778 0.74418605 0.775 0.75555556 0.77083333 0.76086957 0.78 0.7804878 0.74418605] mean value: 0.7674610415499649 key: test_recall value: [0.33333333 0.16666667 0.33333333 0.5 0.6 0.6 0.4 0.33333333 0.16666667 0.66666667] mean value: 0.41 key: train_recall value: [0.64705882 0.68627451 0.62745098 0.60784314 0.65384615 0.71153846 0.67307692 0.76470588 0.62745098 0.62745098] mean value: 0.6626696832579185 key: test_roc_auc value: [0.5 0.41666667 0.58333333 0.66666667 0.55 0.55 0.45 0.46666667 0.48333333 0.63333333] mean value: 0.53 key: train_roc_auc value: [0.73529412 0.74509804 0.70588235 0.71568627 0.71907994 0.74792609 0.72869532 0.77658371 0.72718703 0.70795626] mean value: 0.7309389140271493 key: test_jcc value: [0.25 0.125 0.28571429 0.42857143 0.375 0.375 0.25 0.25 0.14285714 0.5 ] mean value: 0.2982142857142857 key: train_jcc value: [0.55 0.57377049 0.51612903 0.51666667 0.53968254 0.58730159 0.55555556 0.62903226 0.53333333 0.51612903] mean value: 0.5517600496923607 MCC on Blind test: 0.27 Accuracy on Blind test: 0.62 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.07481241 0.22507048 0.04041743 0.04007125 0.04189968 0.04101229 0.04206061 0.04118204 0.04240179 0.04212117] mean value: 0.06310491561889649 key: score_time value: [0.01080179 0.01181221 0.01070094 0.01109576 0.01027584 0.01039696 0.01037216 0.01016545 0.01079583 0.01102448] mean value: 0.010744142532348632 key: test_mcc value: [0.70710678 0.70710678 1. 0.84515425 0.83333333 0.69006556 0.44854261 0.83333333 0.83333333 0.67082039] mean value: 0.7568796383266433 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.83333333 0.83333333 1. 0.91666667 0.90909091 0.81818182 0.72727273 0.90909091 0.90909091 0.81818182] mean value: 0.8674242424242424 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 0.85714286 1. 0.92307692 0.90909091 0.83333333 0.66666667 0.90909091 0.90909091 0.85714286] mean value: 0.8721778221778221 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 0.75 1. 0.85714286 0.83333333 0.71428571 0.75 1. 1. 0.75 ] mean value: 0.8404761904761905 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 0.6 0.83333333 0.83333333 1. ] mean value: 0.9266666666666666 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.83333333 0.83333333 1. 0.91666667 0.91666667 0.83333333 0.71666667 0.91666667 0.91666667 0.8 ] mean value: 0.8683333333333334 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 0.75 1. 0.85714286 0.83333333 0.71428571 0.5 0.83333333 0.83333333 0.75 ] mean value: 0.7821428571428571 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.89 Accuracy on Blind test: 0.95 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.02431273 0.04793501 0.04520869 0.01918387 0.01961684 0.05043674 0.04669476 0.01964712 0.01939678 0.06448698] mean value: 0.0356919527053833 key: score_time value: [0.021065 0.02142334 0.02149987 0.01160932 0.01161528 0.02253747 0.02240157 0.01179361 0.01169348 0.02211642] mean value: 0.017775535583496094 key: test_mcc value: [ 0. 0.16903085 0.16903085 0.66666667 -0.1 0.55901699 -0.46666667 -0.44854261 0.3105295 0.43033148] mean value: 0.12893970673098185 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.5 0.58333333 0.58333333 0.83333333 0.45454545 0.72727273 0.27272727 0.27272727 0.63636364 0.63636364] mean value: 0.55 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.57142857 0.61538462 0.61538462 0.83333333 0.4 0.76923077 0.2 0.2 0.6 0.5 ] mean value: 0.5304761904761904 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.5 0.57142857 0.57142857 0.83333333 0.4 0.625 0.2 0.25 0.75 1. ] mean value: 0.5701190476190476 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.66666667 0.66666667 0.83333333 0.4 1. 0.2 0.16666667 0.5 0.33333333] mean value: 0.5433333333333333 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.5 0.58333333 0.58333333 0.83333333 0.45 0.75 0.26666667 0.28333333 0.65 0.66666667] mean value: 0.5566666666666666 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.4 0.44444444 0.44444444 0.71428571 0.25 0.625 0.11111111 0.11111111 0.42857143 0.33333333] mean value: 0.3862301587301587 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.27 Accuracy on Blind test: 0.62 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01184154 0.01136208 0.00888276 0.00873661 0.00852537 0.00840068 0.00836301 0.0085299 0.00847936 0.00849724] mean value: 0.009161853790283203 key: score_time value: [0.01137066 0.00879383 0.00873899 0.00863862 0.00825405 0.00836372 0.00826645 0.00838542 0.00832129 0.00829244] mean value: 0.008742547035217286 key: test_mcc value: [ 0.16903085 0.16903085 0. 0.35355339 0.55901699 0.63333333 0.46666667 -0.1 -0.06900656 0.06900656] mean value: 0.22506320868596277 key: train_mcc value: [0.39223227 0.47067872 0.41464421 0.41176471 0.35941135 0.43702866 0.43702866 0.42093969 0.45625943 0.45639893] mean value: 0.42563866339517525 key: test_accuracy value: [0.58333333 0.58333333 0.5 0.66666667 0.72727273 0.81818182 0.72727273 0.45454545 0.45454545 0.54545455] mean value: 0.6060606060606061 key: train_accuracy value: [0.69607843 0.73529412 0.70588235 0.70588235 0.67961165 0.7184466 0.7184466 0.70873786 0.72815534 0.72815534] mean value: 0.7124690652960214 key: test_fscore value: [0.61538462 0.61538462 0.5 0.71428571 0.76923077 0.8 0.72727273 0.5 0.4 0.61538462] mean value: 0.6256943056943057 key: train_fscore value: [0.69306931 0.73267327 0.72222222 0.70588235 0.69158879 0.7184466 0.7184466 0.72222222 0.7254902 0.72 ] mean value: 0.7150041556651703 key: test_precision value: [0.57142857 0.57142857 0.5 0.625 0.625 0.8 0.66666667 0.5 0.5 0.57142857] mean value: 0.5930952380952381 key: train_precision value: [0.7 0.74 0.68421053 0.70588235 0.67272727 0.7254902 0.7254902 0.68421053 0.7254902 0.73469388] mean value: 0.7098195144086342 key: test_recall value: [0.66666667 0.66666667 0.5 0.83333333 1. 0.8 0.8 0.5 0.33333333 0.66666667] mean value: 0.6766666666666666 key: train_recall value: [0.68627451 0.7254902 0.76470588 0.70588235 0.71153846 0.71153846 0.71153846 0.76470588 0.7254902 0.70588235] mean value: 0.7213046757164404 key: test_roc_auc value: [0.58333333 0.58333333 0.5 0.66666667 0.75 0.81666667 0.73333333 0.45 0.46666667 0.53333333] mean value: 0.6083333333333333 key: train_roc_auc value: [0.69607843 0.73529412 0.70588235 0.70588235 0.67929864 0.71851433 0.71851433 0.70927602 0.72812971 0.72794118] mean value: 0.7124811463046757 key: test_jcc value: [0.44444444 0.44444444 0.33333333 0.55555556 0.625 0.66666667 0.57142857 0.33333333 0.25 0.44444444] mean value: 0.46686507936507937 key: train_jcc value: [0.53030303 0.578125 0.56521739 0.54545455 0.52857143 0.56060606 0.56060606 0.56521739 0.56923077 0.5625 ] mean value: 0.556583167738059 MCC on Blind test: 0.22 Accuracy on Blind test: 0.62 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01127481 0.01294661 0.01390338 0.01304626 0.0125351 0.01292324 0.01466441 0.01396894 0.01350164 0.01370597] mean value: 0.013247036933898925 key: score_time value: [0.01279116 0.01113725 0.01108098 0.01119018 0.01119089 0.0111618 0.01162934 0.01145244 0.01148176 0.01136208] mean value: 0.011447787284851074 key: test_mcc value: [ 0.84515425 0.16903085 0.84515425 0.35355339 0.69006556 0.2608746 -0.26666667 0.43033148 0.46666667 0.3105295 ] mean value: 0.41046938923293347 key: train_mcc value: [0.85370578 0.85370578 0.88507941 0.56980288 0.72812971 0.71696975 0.90305552 0.78824078 0.7730823 0.83786936] mean value: 0.7909641286354275 key: test_accuracy value: [0.91666667 0.58333333 0.91666667 0.66666667 0.81818182 0.63636364 0.36363636 0.63636364 0.72727273 0.63636364] mean value: 0.6901515151515152 key: train_accuracy value: [0.92156863 0.92156863 0.94117647 0.74509804 0.86407767 0.84466019 0.95145631 0.88349515 0.87378641 0.91262136] mean value: 0.8859508852084523 key: test_fscore value: [0.90909091 0.61538462 0.92307692 0.6 0.83333333 0.5 0.36363636 0.5 0.72727273 0.6 ] mean value: 0.6571794871794872 key: train_fscore value: [0.91489362 0.91489362 0.94339623 0.65789474 0.86538462 0.82222222 0.95238095 0.86666667 0.88695652 0.90322581] mean value: 0.8727914982144953 key: test_precision value: [1. 0.57142857 0.85714286 0.75 0.71428571 0.66666667 0.33333333 1. 0.8 0.75 ] mean value: 0.7442857142857143 key: train_precision value: [1. 1. 0.90909091 1. 0.86538462 0.97368421 0.94339623 1. 0.796875 1. ] mean value: 0.9488430961416935 key: test_recall value: [0.83333333 0.66666667 1. 0.5 1. 0.4 0.4 0.33333333 0.66666667 0.5 ] mean value: 0.63 key: train_recall value: [0.84313725 0.84313725 0.98039216 0.49019608 0.86538462 0.71153846 0.96153846 0.76470588 1. 0.82352941] mean value: 0.8283559577677224 key: test_roc_auc value: [0.91666667 0.58333333 0.91666667 0.66666667 0.83333333 0.61666667 0.36666667 0.66666667 0.73333333 0.65 ] mean value: 0.6950000000000001 key: train_roc_auc value: [0.92156863 0.92156863 0.94117647 0.74509804 0.86406486 0.84596531 0.95135747 0.88235294 0.875 0.91176471] mean value: 0.8859917043740573 key: test_jcc value: [0.83333333 0.44444444 0.85714286 0.42857143 0.71428571 0.33333333 0.22222222 0.33333333 0.57142857 0.42857143] mean value: 0.5166666666666666 key: train_jcc value: [0.84313725 0.84313725 0.89285714 0.49019608 0.76271186 0.69811321 0.90909091 0.76470588 0.796875 0.82352941] mean value: 0.7824354006254942 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01307654 0.01308775 0.01293898 0.01283979 0.01317358 0.01239753 0.01288557 0.01263547 0.01240683 0.01329827] mean value: 0.012874031066894531 key: score_time value: [0.01016164 0.01159859 0.01144123 0.01127625 0.01130557 0.01118708 0.01118135 0.01122093 0.01121593 0.01126552] mean value: 0.011185407638549805 key: test_mcc value: [ 0.35355339 0. 0.30151134 0.70710678 0.46666667 -0.1490712 0. 0.55901699 0.55901699 0.1490712 ] mean value: 0.29468721717741464 key: train_mcc value: [0.58489765 0.86692145 0.39886202 0.60246408 0.92304797 0.74927733 0.57166195 0.67789495 0.76763491 0.69330532] mean value: 0.6835967621723328 key: test_accuracy value: [0.66666667 0.5 0.58333333 0.83333333 0.72727273 0.45454545 0.45454545 0.72727273 0.72727273 0.54545455] mean value: 0.621969696969697 key: train_accuracy value: [0.75490196 0.93137255 0.6372549 0.7745098 0.96116505 0.86407767 0.74757282 0.81553398 0.88349515 0.82524272] mean value: 0.8195126594327051 key: test_fscore value: [0.71428571 0.625 0.70588235 0.85714286 0.72727273 0.25 0.625 0.66666667 0.66666667 0.44444444] mean value: 0.6282361429420252 key: train_fscore value: [0.80314961 0.93457944 0.73381295 0.81300813 0.96078431 0.84782609 0.8 0.77108434 0.88461538 0.78571429] mean value: 0.8334574533634218 key: test_precision value: [0.625 0.5 0.54545455 0.75 0.66666667 0.33333333 0.45454545 1. 1. 0.66666667] mean value: 0.6541666666666667 key: train_precision value: [0.67105263 0.89285714 0.57954545 0.69444444 0.98 0.975 0.66666667 1. 0.86792453 1. ] mean value: 0.8327490868394543 key: test_recall value: [0.83333333 0.83333333 1. 1. 0.8 0.2 1. 0.5 0.5 0.33333333] mean value: 0.7 key: train_recall value: [1. 0.98039216 1. 0.98039216 0.94230769 0.75 1. 0.62745098 0.90196078 0.64705882] mean value: 0.8829562594268476 key: test_roc_auc value: [0.66666667 0.5 0.58333333 0.83333333 0.73333333 0.43333333 0.5 0.75 0.75 0.56666667] mean value: 0.6316666666666667 key: train_roc_auc value: [0.75490196 0.93137255 0.6372549 0.7745098 0.96134992 0.86519608 0.74509804 0.81372549 0.8836727 0.82352941] mean value: 0.8190610859728507 key: test_jcc value: [0.55555556 0.45454545 0.54545455 0.75 0.57142857 0.14285714 0.45454545 0.5 0.5 0.28571429] mean value: 0.476010101010101 key: train_jcc value: [0.67105263 0.87719298 0.57954545 0.68493151 0.9245283 0.73584906 0.66666667 0.62745098 0.79310345 0.64705882] mean value: 0.7207379852784521 MCC on Blind test: 0.71 Accuracy on Blind test: 0.86 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.09563279 0.08601046 0.08657742 0.08628511 0.08625317 0.08617806 0.08666682 0.08725524 0.08604097 0.08653998] mean value: 0.08734400272369384 key: score_time value: [0.01442719 0.01441336 0.01466346 0.01440644 0.01426506 0.01482415 0.01447535 0.01438451 0.01432061 0.01447439] mean value: 0.01446545124053955 key: test_mcc value: [0.33333333 0.70710678 0.70710678 0.84515425 0.83333333 0.69006556 0.63333333 0.26666667 0.69006556 0.83333333] mean value: 0.653949893578632 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.66666667 0.83333333 0.83333333 0.91666667 0.90909091 0.81818182 0.81818182 0.63636364 0.81818182 0.90909091] mean value: 0.8159090909090909 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.85714286 0.8 0.92307692 0.90909091 0.83333333 0.8 0.66666667 0.8 0.90909091] mean value: 0.8165068265068265 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.66666667 0.75 1. 0.85714286 0.83333333 0.71428571 0.8 0.66666667 1. 1. ] mean value: 0.8288095238095238 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 1. 0.66666667 1. 1. 1. 0.8 0.66666667 0.66666667 0.83333333] mean value: 0.83 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.66666667 0.83333333 0.83333333 0.91666667 0.91666667 0.83333333 0.81666667 0.63333333 0.83333333 0.91666667] mean value: 0.8200000000000001 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.75 0.66666667 0.85714286 0.83333333 0.71428571 0.66666667 0.5 0.66666667 0.83333333] mean value: 0.6988095238095238 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.67 Accuracy on Blind test: 0.84 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.03288817 0.03888249 0.04622889 0.03841782 0.03640127 0.03987908 0.05036545 0.03022933 0.04997826 0.03656101] mean value: 0.039983177185058595 key: score_time value: [0.02168846 0.01711726 0.02786493 0.02710056 0.02722168 0.0274992 0.02268314 0.03377604 0.0342288 0.03278279] mean value: 0.027196288108825684 key: test_mcc value: [0.70710678 0.70710678 0.70710678 0.50709255 0.63333333 0.46666667 0.44854261 1. 0.83333333 0.63333333] mean value: 0.6643622176635949 key: train_mcc value: [0.96078431 0.98058068 0.96152395 0.96078431 1. 1. 0.98076205 0.98076923 0.98076923 1. ] mean value: 0.9805973758711093 key: test_accuracy value: [0.83333333 0.83333333 0.83333333 0.75 0.81818182 0.72727273 0.72727273 1. 0.90909091 0.81818182] mean value: 0.8250000000000001 key: train_accuracy value: [0.98039216 0.99019608 0.98039216 0.98039216 1. 1. 0.99029126 0.99029126 0.99029126 1. ] mean value: 0.9902246335427375 key: test_fscore value: [0.85714286 0.85714286 0.8 0.72727273 0.8 0.72727273 0.66666667 1. 0.90909091 0.83333333] mean value: 0.8177922077922077 key: train_fscore value: [0.98039216 0.99029126 0.98 0.98039216 1. 1. 0.99047619 0.99029126 0.99029126 1. ] mean value: 0.9902134290609448 key: test_precision value: [0.75 0.75 1. 0.8 0.8 0.66666667 0.75 1. 1. 0.83333333] mean value: 0.835 key: train_precision value: [0.98039216 0.98076923 1. 0.98039216 1. 1. 0.98113208 0.98076923 0.98076923 1. ] mean value: 0.988422408150488 key: test_recall value: [1. 1. 0.66666667 0.66666667 0.8 0.8 0.6 1. 0.83333333 0.83333333] mean value: 0.8200000000000001 key: train_recall value: [0.98039216 1. 0.96078431 0.98039216 1. 1. 1. 1. 1. 1. ] mean value: 0.9921568627450981 key: test_roc_auc value: [0.83333333 0.83333333 0.83333333 0.75 0.81666667 0.73333333 0.71666667 1. 0.91666667 0.81666667] mean value: 0.8250000000000001 key: train_roc_auc value: [0.98039216 0.99019608 0.98039216 0.98039216 1. 1. 0.99019608 0.99038462 0.99038462 1. ] mean value: 0.990233785822021 key: test_jcc value: [0.75 0.75 0.66666667 0.57142857 0.66666667 0.57142857 0.5 1. 0.83333333 0.71428571] mean value: 0.7023809523809523 key: train_jcc value: [0.96153846 0.98076923 0.96078431 0.96153846 1. 1. 0.98113208 0.98076923 0.98076923 1. ] mean value: 0.9807301004581803 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.01395178 0.01651168 0.0164876 0.01868868 0.0394485 0.0395906 0.01672554 0.03937197 0.03952646 0.03938293] mean value: 0.027968573570251464 key: score_time value: [0.01182222 0.01168132 0.01170325 0.02135181 0.02071929 0.01184392 0.0117321 0.02081585 0.02087355 0.02112746] mean value: 0.016367077827453613 key: test_mcc value: [ 0.16903085 -0.50709255 0.16903085 0. 0.55901699 0.2608746 -0.1490712 -0.26666667 -0.2608746 0.3105295 ] mean value: 0.028477777996665087 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.58333333 0.25 0.58333333 0.5 0.72727273 0.63636364 0.45454545 0.36363636 0.36363636 0.63636364] mean value: 0.5098484848484849 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.54545455 0.18181818 0.61538462 0.5 0.76923077 0.5 0.25 0.36363636 0.22222222 0.6 ] mean value: 0.45477466977466974 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.6 0.2 0.57142857 0.5 0.625 0.66666667 0.33333333 0.4 0.33333333 0.75 ] mean value: 0.49797619047619046 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.5 0.16666667 0.66666667 0.5 1. 0.4 0.2 0.33333333 0.16666667 0.5 ] mean value: 0.44333333333333336 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.58333333 0.25 0.58333333 0.5 0.75 0.61666667 0.43333333 0.36666667 0.38333333 0.65 ] mean value: 0.5116666666666667 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.375 0.1 0.44444444 0.33333333 0.625 0.33333333 0.14285714 0.22222222 0.125 0.42857143] mean value: 0.31297619047619046 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.06 Accuracy on Blind test: 0.54 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.23015356 0.21968126 0.21211481 0.2232914 0.22050714 0.21650219 0.21859217 0.22216439 0.21892643 0.21719503] mean value: 0.21991283893585206 key: score_time value: [0.00920582 0.00885582 0.0088439 0.00896955 0.00899076 0.00892496 0.00895596 0.00880098 0.00909376 0.0088861 ] mean value: 0.008952760696411132 key: test_mcc value: [0.70710678 0.84515425 1. 0.84515425 0.83333333 0.69006556 0.44854261 0.63333333 0.69006556 0.82807867] mean value: 0.7520834360778311 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.83333333 0.91666667 1. 0.91666667 0.90909091 0.81818182 0.72727273 0.81818182 0.81818182 0.90909091] mean value: 0.8666666666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 0.92307692 1. 0.92307692 0.90909091 0.83333333 0.66666667 0.83333333 0.8 0.92307692] mean value: 0.8668797868797868 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 0.85714286 1. 0.85714286 0.83333333 0.71428571 0.75 0.83333333 1. 0.85714286] mean value: 0.8452380952380952 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 0.6 0.83333333 0.66666667 1. ] mean value: 0.91 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.83333333 0.91666667 1. 0.91666667 0.91666667 0.83333333 0.71666667 0.81666667 0.83333333 0.9 ] mean value: 0.8683333333333334 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 0.85714286 1. 0.85714286 0.83333333 0.71428571 0.5 0.71428571 0.66666667 0.85714286] mean value: 0.775 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.01520419 0.01495075 0.01517487 0.01506615 0.01498771 0.01498652 0.01938462 0.01636744 0.03074956 0.0168345 ] mean value: 0.01737062931060791 key: score_time value: [0.01191139 0.01167727 0.01162887 0.01161718 0.01165557 0.01290107 0.01460361 0.014503 0.02054787 0.01282907] mean value: 0.013387489318847656 key: test_mcc value: [ 0. 0.50709255 0.35355339 -0.16903085 0.06900656 0.1490712 0.46666667 0.26666667 -0.1 0.1 ] mean value: 0.16430261802522353 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.5 0.75 0.66666667 0.41666667 0.54545455 0.54545455 0.72727273 0.63636364 0.45454545 0.54545455] mean value: 0.5787878787878787 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.5 0.76923077 0.71428571 0.36363636 0.44444444 0.61538462 0.72727273 0.66666667 0.5 0.54545455] mean value: 0.5846375846375846 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.5 0.71428571 0.625 0.4 0.5 0.5 0.66666667 0.66666667 0.5 0.6 ] mean value: 0.5672619047619047 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.5 0.83333333 0.83333333 0.33333333 0.4 0.8 0.8 0.66666667 0.5 0.5 ] mean value: 0.6166666666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.5 0.75 0.66666667 0.41666667 0.53333333 0.56666667 0.73333333 0.63333333 0.45 0.55 ] mean value: 0.5800000000000001 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.33333333 0.625 0.55555556 0.22222222 0.28571429 0.44444444 0.57142857 0.5 0.33333333 0.375 ] mean value: 0.4246031746031746 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.15 Accuracy on Blind test: 0.54 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.03670239 0.0243206 0.03172588 0.0315814 0.03152347 0.03162026 0.0317049 0.03175974 0.03160596 0.03164554] mean value: 0.031419014930725096 key: score_time value: [0.01155329 0.01707244 0.01971054 0.02251983 0.02160668 0.0201211 0.01141977 0.02167892 0.02119327 0.02242517] mean value: 0.01893010139465332 key: test_mcc value: [0.16903085 0. 0.84515425 0.66666667 0.69006556 0.26666667 0.26666667 0.55901699 0.69006556 0.26666667] mean value: 0.44199998854005423 key: train_mcc value: [0.94135745 1. 0.96152395 0.92227807 0.98076923 0.9229904 0.94193062 0.9229904 0.96116139 0.96187302] mean value: 0.9516874525588295 key: test_accuracy value: [0.58333333 0.5 0.91666667 0.83333333 0.81818182 0.63636364 0.63636364 0.72727273 0.81818182 0.63636364] mean value: 0.7106060606060606 key: train_accuracy value: [0.97058824 1. 0.98039216 0.96078431 0.99029126 0.96116505 0.97087379 0.96116505 0.98058252 0.98058252] mean value: 0.975642490005711 key: test_fscore value: [0.61538462 0.625 0.90909091 0.83333333 0.83333333 0.6 0.6 0.66666667 0.8 0.66666667] mean value: 0.7149475524475524 key: train_fscore value: [0.97029703 1. 0.98 0.96 0.99029126 0.96226415 0.97087379 0.96 0.98039216 0.98 ] mean value: 0.97541183860528 key: test_precision value: [0.57142857 0.5 1. 0.83333333 0.71428571 0.6 0.6 1. 1. 0.66666667] mean value: 0.7485714285714286 key: train_precision value: [0.98 1. 1. 0.97959184 1. 0.94444444 0.98039216 0.97959184 0.98039216 1. ] mean value: 0.9844412431639322 key: test_recall value: [0.66666667 0.83333333 0.83333333 0.83333333 1. 0.6 0.6 0.5 0.66666667 0.66666667] mean value: 0.72 key: train_recall value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:168: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:171: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.96078431 1. 0.96078431 0.94117647 0.98076923 0.98076923 0.96153846 0.94117647 0.98039216 0.96078431] mean value: 0.9668174962292609 key: test_roc_auc value: [0.58333333 0.5 0.91666667 0.83333333 0.83333333 0.63333333 0.63333333 0.75 0.83333333 0.63333333] mean value: 0.715 key: train_roc_auc value: [0.97058824 1. 0.98039216 0.96078431 0.99038462 0.96097285 0.97096531 0.96097285 0.98058069 0.98039216] mean value: 0.975603318250377 key: test_jcc value: [0.44444444 0.45454545 0.83333333 0.71428571 0.71428571 0.42857143 0.42857143 0.5 0.66666667 0.5 ] mean value: 0.5684704184704185 key: train_jcc value: [0.94230769 1. 0.96078431 0.92307692 0.98076923 0.92727273 0.94339623 0.92307692 0.96153846 0.96078431] mean value: 0.9523006811908032 MCC on Blind test: 0.6 Accuracy on Blind test: 0.81 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.25216579 0.17971325 0.18904495 0.18921661 0.27039409 0.28169656 0.18438935 0.18478012 0.20222712 0.11990452] mean value: 0.2053532361984253 key: score_time value: [0.02050495 0.02130389 0.01642084 0.01335454 0.02752233 0.02227807 0.01993704 0.0214355 0.02152538 0.01164627] mean value: 0.019592881202697754 key: test_mcc value: [0.16903085 0.35355339 0.50709255 0.50709255 0.63333333 0.26666667 0.3105295 0.43033148 0.69006556 0.3105295 ] mean value: 0.4178225392875605 key: train_mcc value: [0.94135745 1. 1. 0.60972137 1. 1. 1. 1. 1. 1. ] mean value: 0.9551078823340995 key: test_accuracy value: [0.58333333 0.66666667 0.75 0.75 0.81818182 0.63636364 0.63636364 0.63636364 0.81818182 0.63636364] mean value: 0.6931818181818182 key: train_accuracy value: [0.97058824 1. 1. 0.80392157 1. 1. 1. 1. 1. 1. ] mean value: 0.9774509803921568 key: test_fscore value: [0.61538462 0.71428571 0.72727273 0.72727273 0.8 0.6 0.66666667 0.5 0.8 0.6 ] mean value: 0.675088245088245 key: train_fscore value: [0.97029703 1. 1. 0.79591837 1. 1. 1. 1. 1. 1. ] mean value: 0.9766215397049909 key: test_precision value: [0.57142857 0.625 0.8 0.8 0.8 0.6 0.57142857 1. 1. 0.75 ] mean value: 0.7517857142857143 key: train_precision value: [0.98 1. 1. 0.82978723 1. 1. 1. 1. 1. 1. ] mean value: 0.9809787234042553 key: test_recall value: [0.66666667 0.83333333 0.66666667 0.66666667 0.8 0.6 0.8 0.33333333 0.66666667 0.5 ] mean value: 0.6533333333333333 key: train_recall value: [0.96078431 1. 1. 0.76470588 1. 1. 1. 1. 1. 1. ] mean value: 0.9725490196078431 key: test_roc_auc value: [0.58333333 0.66666667 0.75 0.75 0.81666667 0.63333333 0.65 0.66666667 0.83333333 0.65 ] mean value: 0.7 key: train_roc_auc value: [0.97058824 1. 1. 0.80392157 1. 1. 1. 1. 1. 1. ] mean value: 0.9774509803921568 key: test_jcc value: [0.44444444 0.55555556 0.57142857 0.57142857 0.66666667 0.42857143 0.5 0.33333333 0.66666667 0.42857143] mean value: 0.5166666666666666 key: train_jcc value: [0.94230769 1. 1. 0.66101695 1. 1. 1. 1. 1. 1. ] mean value: 0.9603324641460235 MCC on Blind test: 0.6 Accuracy on Blind test: 0.81 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.03044844 0.02831769 0.02996039 0.03043008 0.02798986 0.02752709 0.02288771 0.02915406 0.02952576 0.02516389] mean value: 0.028140497207641602 key: score_time value: [0.01249909 0.01163912 0.01143289 0.01147437 0.01142287 0.01163936 0.01132512 0.0114665 0.01140189 0.01849389] mean value: 0.012279510498046875 key: test_mcc value: [0.89893315 0.68543653 0.4472136 0.79772404 0.56980288 0.47140452 0.56980288 0.2236068 0.55555556 0.70710678] mean value: 0.5926586727385535 key: train_mcc value: [0.8039452 0.84201212 0.84202713 0.84202713 0.82951506 0.83025669 0.81762054 0.89084029 0.80511756 0.85391256] mean value: 0.8357274278898956 key: test_accuracy value: [0.94736842 0.84210526 0.72222222 0.88888889 0.77777778 0.72222222 0.77777778 0.61111111 0.77777778 0.83333333] mean value: 0.7900584795321637 key: train_accuracy value: [0.90184049 0.9202454 0.92073171 0.92073171 0.91463415 0.91463415 0.90853659 0.94512195 0.90243902 0.92682927] mean value: 0.9175744426155918 key: test_fscore value: [0.94117647 0.85714286 0.70588235 0.875 0.8 0.66666667 0.75 0.58823529 0.77777778 0.8 ] mean value: 0.7761881419234361 key: train_fscore value: [0.90123457 0.91719745 0.91925466 0.91925466 0.91358025 0.9125 0.9068323 0.94409938 0.90123457 0.92592593] mean value: 0.9161113754660095 key: test_precision value: [1. 0.81818182 0.75 1. 0.72727273 0.83333333 0.85714286 0.625 0.77777778 1. ] mean value: 0.8388708513708514 key: train_precision value: [0.9125 0.94736842 0.93670886 0.93670886 0.925 0.93589744 0.92405063 0.96202532 0.9125 0.9375 ] mean value: 0.9330259527836143 key: test_recall value: [0.88888889 0.9 0.66666667 0.77777778 0.88888889 0.55555556 0.66666667 0.55555556 0.77777778 0.66666667] mean value: 0.7344444444444445 key: train_recall value: [0.8902439 0.88888889 0.90243902 0.90243902 0.90243902 0.8902439 0.8902439 0.92682927 0.8902439 0.91463415] mean value: 0.8998644986449864 key: test_roc_auc value: [0.94444444 0.83888889 0.72222222 0.88888889 0.77777778 0.72222222 0.77777778 0.61111111 0.77777778 0.83333333] mean value: 0.7894444444444444 key: train_roc_auc value: [0.90191207 0.9200542 0.92073171 0.92073171 0.91463415 0.91463415 0.90853659 0.94512195 0.90243902 0.92682927] mean value: 0.9175624811803673 key: test_jcc value: [0.88888889 0.75 0.54545455 0.77777778 0.66666667 0.5 0.6 0.41666667 0.63636364 0.66666667] mean value: 0.6448484848484848 key: train_jcc value: [0.82022472 0.84705882 0.85057471 0.85057471 0.84090909 0.83908046 0.82954545 0.89411765 0.82022472 0.86206897] mean value: 0.845437930481974 MCC on Blind test: 0.4 Accuracy on Blind test: 0.73 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.86987448 0.64905834 0.65774155 0.85697532 0.66762519 0.64877272 0.83129621 0.69926929 0.71107221 0.85802889] mean value: 0.7449714183807373 key: score_time value: [0.0130856 0.01300454 0.01301813 0.01331544 0.01296258 0.01906753 0.01196313 0.01293635 0.01503968 0.0136528 ] mean value: 0.01380457878112793 key: test_mcc value: [0.89893315 0.9 0.56980288 0.89442719 0.89442719 0.67082039 0.56980288 0.47140452 0.70710678 0.79772404] mean value: 0.7374449026992183 key: train_mcc value: [1. 1. 1. 1. 0.98787834 1. 1. 1. 1. 1. ] mean value: 0.9987878339907214 key: test_accuracy value: [0.94736842 0.94736842 0.77777778 0.94444444 0.94444444 0.83333333 0.77777778 0.72222222 0.83333333 0.88888889] mean value: 0.8616959064327485 key: train_accuracy value: [1. 1. 1. 1. 0.99390244 1. 1. 1. 1. 1. ] mean value: 0.999390243902439 key: test_fscore value: [0.94117647 0.94736842 0.75 0.94117647 0.94736842 0.82352941 0.75 0.66666667 0.8 0.875 ] mean value: 0.8442285861713107 key: train_fscore value: [1. 1. 1. 1. 0.99386503 1. 1. 1. 1. 1. ] mean value: 0.9993865030674847 key: test_precision value: [1. 1. 0.85714286 1. 0.9 0.875 0.85714286 0.83333333 1. 1. ] mean value: 0.9322619047619047 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88888889 0.9 0.66666667 0.88888889 1. 0.77777778 0.66666667 0.55555556 0.66666667 0.77777778] mean value: 0.7788888888888889 key: train_recall value: [1. 1. 1. 1. 0.98780488 1. 1. 1. 1. 1. ] mean value: 0.998780487804878 key: test_roc_auc value: [0.94444444 0.95 0.77777778 0.94444444 0.94444444 0.83333333 0.77777778 0.72222222 0.83333333 0.88888889] mean value: 0.8616666666666666 key: train_roc_auc value: [1. 1. 1. 1. 0.99390244 1. 1. 1. 1. 1. ] mean value: 0.999390243902439 key: test_jcc value: [0.88888889 0.9 0.6 0.88888889 0.9 0.7 0.6 0.5 0.66666667 0.77777778] mean value: 0.7422222222222222 key: train_jcc value: [1. 1. 1. 1. 0.98780488 1. 1. 1. 1. 1. ] mean value: 0.998780487804878 MCC on Blind test: 0.65 Accuracy on Blind test: 0.84 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01251841 0.01035118 0.00904322 0.00870728 0.0086 0.00857806 0.00861526 0.00875258 0.00855613 0.00859094] mean value: 0.009231305122375489 key: score_time value: [0.01379108 0.00886035 0.00869012 0.00853968 0.0084424 0.00838447 0.00841713 0.00837684 0.00846767 0.00849962] mean value: 0.00904693603515625 key: test_mcc value: [0.28752732 0.01807754 0.3721042 0.24253563 0. 0.4472136 0.23570226 0. 0.55555556 0.53452248] mean value: 0.26932385786240387 key: train_mcc value: [0.39666608 0.40983565 0.45749571 0.48276756 0.49111711 0.48245064 0.45222959 0.44501237 0.42597138 0.41007219] mean value: 0.44536182687198245 key: test_accuracy value: [0.63157895 0.52631579 0.66666667 0.55555556 0.5 0.66666667 0.61111111 0.5 0.77777778 0.72222222] mean value: 0.6157894736842106 key: train_accuracy value: [0.6809816 0.65030675 0.70121951 0.70731707 0.70121951 0.7195122 0.68902439 0.70121951 0.68292683 0.65853659] mean value: 0.6892263953314379 key: test_fscore value: [0.66666667 0.66666667 0.72727273 0.69230769 0.64 0.75 0.66666667 0.60869565 0.77777778 0.7826087 ] mean value: 0.6978662545184284 key: train_fscore value: [0.73737374 0.73732719 0.75862069 0.76699029 0.76777251 0.76767677 0.75598086 0.75376884 0.74757282 0.74074074] mean value: 0.7533824448496093 key: test_precision value: [0.58333333 0.52941176 0.61538462 0.52941176 0.5 0.6 0.58333333 0.5 0.77777778 0.64285714] mean value: 0.5861509732097967 key: train_precision value: [0.62931034 0.58823529 0.63636364 0.63709677 0.62790698 0.65517241 0.62204724 0.64102564 0.62096774 0.59701493] mean value: 0.6255140992468455 key: test_recall value: [0.77777778 0.9 0.88888889 1. 0.88888889 1. 0.77777778 0.77777778 0.77777778 1. ] mean value: 0.8788888888888888 key: train_recall value: [0.8902439 0.98765432 0.93902439 0.96341463 0.98780488 0.92682927 0.96341463 0.91463415 0.93902439 0.97560976] mean value: 0.9487654320987654 key: test_roc_auc value: [0.63888889 0.50555556 0.66666667 0.55555556 0.5 0.66666667 0.61111111 0.5 0.77777778 0.72222222] mean value: 0.6144444444444445 key: train_roc_auc value: [0.67968985 0.65236375 0.70121951 0.70731707 0.70121951 0.7195122 0.68902439 0.70121951 0.68292683 0.65853659] mean value: 0.6893029208069859 key: test_jcc value: [0.5 0.5 0.57142857 0.52941176 0.47058824 0.6 0.5 0.4375 0.63636364 0.64285714] mean value: 0.538814935064935 key: train_jcc value: [0.584 0.58394161 0.61111111 0.62204724 0.62307692 0.62295082 0.60769231 0.60483871 0.59689922 0.58823529] mean value: 0.6044793240087645 MCC on Blind test: 0.27 Accuracy on Blind test: 0.68 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00905609 0.00897646 0.00951314 0.00907636 0.00915027 0.00914383 0.00920677 0.00918961 0.00925541 0.00928283] mean value: 0.009185075759887695 key: score_time value: [0.00866008 0.008641 0.00868464 0.0087235 0.0087688 0.00884247 0.0088706 0.00899959 0.00887132 0.00893378] mean value: 0.008799576759338379 key: test_mcc value: [ 0.15555556 0.04494666 0.33333333 -0.11396058 0. 0.11396058 0.34188173 -0.2236068 0.33333333 0.4472136 ] mean value: 0.14326574068486644 key: train_mcc value: [0.44782413 0.41266129 0.42711521 0.45125307 0.48838629 0.43915503 0.49147319 0.48838629 0.47564513 0.40270863] mean value: 0.452460824775168 key: test_accuracy value: [0.57894737 0.52631579 0.66666667 0.44444444 0.5 0.55555556 0.66666667 0.38888889 0.66666667 0.72222222] mean value: 0.5716374269005848 key: train_accuracy value: [0.72392638 0.70552147 0.71341463 0.72560976 0.74390244 0.7195122 0.74390244 0.74390244 0.73780488 0.70121951] mean value: 0.7258716145443663 key: test_fscore value: [0.55555556 0.57142857 0.66666667 0.5 0.47058824 0.5 0.625 0.42105263 0.66666667 0.73684211] mean value: 0.5713800432453683 key: train_fscore value: [0.72727273 0.68831169 0.70807453 0.72727273 0.75 0.71604938 0.75862069 0.7375 0.73939394 0.70658683] mean value: 0.72590825151311 key: test_precision value: [0.55555556 0.54545455 0.66666667 0.45454545 0.5 0.57142857 0.71428571 0.4 0.66666667 0.7 ] mean value: 0.5774603174603175 key: train_precision value: [0.72289157 0.7260274 0.72151899 0.72289157 0.73255814 0.725 0.7173913 0.75641026 0.73493976 0.69411765] mean value: 0.7253746623520101 key: test_recall value: [0.55555556 0.6 0.66666667 0.55555556 0.44444444 0.44444444 0.55555556 0.44444444 0.66666667 0.77777778] mean value: 0.5711111111111111 key: train_recall value: [0.73170732 0.65432099 0.69512195 0.73170732 0.76829268 0.70731707 0.80487805 0.7195122 0.74390244 0.7195122 ] mean value: 0.7276272207166516 key: test_roc_auc value: [0.57777778 0.52222222 0.66666667 0.44444444 0.5 0.55555556 0.66666667 0.38888889 0.66666667 0.72222222] mean value: 0.5711111111111111 key: train_roc_auc value: [0.72387835 0.70520927 0.71341463 0.72560976 0.74390244 0.7195122 0.74390244 0.74390244 0.73780488 0.70121951] mean value: 0.7258355916892503 key: test_jcc value: [0.38461538 0.4 0.5 0.33333333 0.30769231 0.33333333 0.45454545 0.26666667 0.5 0.58333333] mean value: 0.40635198135198136 key: train_jcc value: [0.57142857 0.52475248 0.54807692 0.57142857 0.6 0.55769231 0.61111111 0.58415842 0.58653846 0.5462963 ] mean value: 0.5701483133661351 MCC on Blind test: 0.38 Accuracy on Blind test: 0.7 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00906515 0.00898194 0.00869346 0.00964141 0.00977135 0.0086484 0.00956511 0.01074243 0.00970697 0.00954914] mean value: 0.009436535835266113 key: score_time value: [0.01515126 0.01482105 0.01515651 0.01516604 0.01514649 0.01451993 0.01451945 0.01603603 0.01516533 0.01491308] mean value: 0.015059518814086913 key: test_mcc value: [ 0.15555556 0.16854997 0.11396058 0. 0.11111111 0.12403473 0.2236068 -0.23570226 0.2236068 0.33333333] mean value: 0.1218056611769099 key: train_mcc value: [0.43930081 0.42455778 0.40610963 0.40391344 0.44556639 0.47592838 0.44232587 0.50093211 0.43229648 0.40391344] mean value: 0.4374844323355661 key: test_accuracy value: [0.57894737 0.57894737 0.55555556 0.5 0.55555556 0.55555556 0.61111111 0.38888889 0.61111111 0.66666667] mean value: 0.560233918128655 key: train_accuracy value: [0.71779141 0.71165644 0.70121951 0.70121951 0.7195122 0.73780488 0.7195122 0.75 0.71341463 0.70121951] mean value: 0.7173350291785127 key: test_fscore value: [0.55555556 0.55555556 0.5 0.4 0.55555556 0.42857143 0.58823529 0.47619048 0.58823529 0.66666667] mean value: 0.5314565826330533 key: train_fscore value: [0.7012987 0.69677419 0.67973856 0.68789809 0.69333333 0.73291925 0.7012987 0.74213836 0.68874172 0.68789809] mean value: 0.701203901120714 key: test_precision value: [0.55555556 0.625 0.57142857 0.5 0.55555556 0.6 0.625 0.41666667 0.625 0.66666667] mean value: 0.5740873015873016 key: train_precision value: [0.75 0.72972973 0.73239437 0.72 0.76470588 0.74683544 0.75 0.76623377 0.75362319 0.72 ] mean value: 0.7433522375957392 key: test_recall value: [0.55555556 0.5 0.44444444 0.33333333 0.55555556 0.33333333 0.55555556 0.55555556 0.55555556 0.66666667] mean value: 0.5055555555555555 key: train_recall value: [0.65853659 0.66666667 0.63414634 0.65853659 0.63414634 0.7195122 0.65853659 0.7195122 0.63414634 0.65853659] mean value: 0.6642276422764227 key: test_roc_auc value: [0.57777778 0.58333333 0.55555556 0.5 0.55555556 0.55555556 0.61111111 0.38888889 0.61111111 0.66666667] mean value: 0.5605555555555556 key: train_roc_auc value: [0.71815718 0.71138211 0.70121951 0.70121951 0.7195122 0.73780488 0.7195122 0.75 0.71341463 0.70121951] mean value: 0.7173441734417345 key: test_jcc value: [0.38461538 0.38461538 0.33333333 0.25 0.38461538 0.27272727 0.41666667 0.3125 0.41666667 0.5 ] mean value: 0.36557400932400935 key: train_jcc value: [0.54 0.53465347 0.51485149 0.52427184 0.53061224 0.57843137 0.54 0.59 0.52525253 0.52427184] mean value: 0.5402344782514942 MCC on Blind test: -0.08 Accuracy on Blind test: 0.49 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01260471 0.01172137 0.01098704 0.01196337 0.0121212 0.01210618 0.01222682 0.01223993 0.01208854 0.01191831] mean value: 0.011997747421264648 key: score_time value: [0.01024818 0.00978136 0.01010489 0.01006174 0.01007962 0.01016092 0.00994873 0.00993824 0.00985742 0.00976133] mean value: 0.009994244575500489 key: test_mcc value: [0.36803496 0.68888889 0.33333333 0.55555556 0.47140452 0. 0.55555556 0.11396058 0.55555556 0.47140452] mean value: 0.4113693471913179 key: train_mcc value: [0.70567864 0.70551039 0.78141806 0.68313005 0.75812978 0.75699875 0.75632256 0.74440079 0.69517365 0.73192505] mean value: 0.7318687723980281 key: test_accuracy value: [0.68421053 0.84210526 0.66666667 0.77777778 0.72222222 0.5 0.77777778 0.55555556 0.77777778 0.72222222] mean value: 0.7026315789473684 key: train_accuracy value: [0.85276074 0.85276074 0.8902439 0.84146341 0.87804878 0.87804878 0.87804878 0.87195122 0.84756098 0.86585366] mean value: 0.865674098458776 key: test_fscore value: [0.625 0.84210526 0.66666667 0.77777778 0.76190476 0.4 0.77777778 0.5 0.77777778 0.66666667] mean value: 0.6795676691729323 key: train_fscore value: [0.85542169 0.85185185 0.8875 0.84337349 0.88235294 0.875 0.87654321 0.8742515 0.84662577 0.86419753] mean value: 0.8657117978369109 key: test_precision value: [0.71428571 0.88888889 0.66666667 0.77777778 0.66666667 0.5 0.77777778 0.57142857 0.77777778 0.83333333] mean value: 0.7174603174603175 key: train_precision value: [0.8452381 0.85185185 0.91025641 0.83333333 0.85227273 0.8974359 0.8875 0.85882353 0.85185185 0.875 ] mean value: 0.8663563696651932 key: test_recall value: [0.55555556 0.8 0.66666667 0.77777778 0.88888889 0.33333333 0.77777778 0.44444444 0.77777778 0.55555556] mean value: 0.6577777777777778 key: train_recall value: [0.86585366 0.85185185 0.86585366 0.85365854 0.91463415 0.85365854 0.86585366 0.8902439 0.84146341 0.85365854] mean value: 0.865672990063234 key: test_roc_auc value: [0.67777778 0.84444444 0.66666667 0.77777778 0.72222222 0.5 0.77777778 0.55555556 0.77777778 0.72222222] mean value: 0.7022222222222222 key: train_roc_auc value: [0.85267992 0.85275519 0.8902439 0.84146341 0.87804878 0.87804878 0.87804878 0.87195122 0.84756098 0.86585366] mean value: 0.8656654622101776 key: test_jcc value: [0.45454545 0.72727273 0.5 0.63636364 0.61538462 0.25 0.63636364 0.33333333 0.63636364 0.5 ] mean value: 0.528962703962704 key: train_jcc value: [0.74736842 0.74193548 0.79775281 0.72916667 0.78947368 0.77777778 0.78021978 0.77659574 0.73404255 0.76086957] mean value: 0.7635202485876846 MCC on Blind test: 0.05 Accuracy on Blind test: 0.57 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.66131997 0.66211009 0.79937983 0.73952007 0.7480371 0.93877983 0.74273109 0.72177291 0.8511014 0.69477081] mean value: 0.7559523105621337 key: score_time value: [0.0125308 0.0154655 0.01416278 0.01435876 0.01442623 0.0205605 0.02195811 0.02220106 0.0232017 0.01383901] mean value: 0.017270445823669434 key: test_mcc value: [0.71611487 0.80903983 0.56980288 0.89442719 0.70710678 0.56980288 0.4472136 0.2236068 0.70710678 0.70710678] mean value: 0.6351328401401198 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.84210526 0.89473684 0.77777778 0.94444444 0.83333333 0.77777778 0.72222222 0.61111111 0.83333333 0.83333333] mean value: 0.8070175438596492 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8 0.88888889 0.75 0.94117647 0.85714286 0.75 0.73684211 0.63157895 0.8 0.8 ] mean value: 0.7955629269251561 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.85714286 1. 0.75 0.85714286 0.7 0.6 1. 1. ] mean value: 0.8764285714285714 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.8 0.66666667 0.88888889 1. 0.66666667 0.77777778 0.66666667 0.66666667 0.66666667] mean value: 0.7466666666666666 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.83333333 0.9 0.77777778 0.94444444 0.83333333 0.77777778 0.72222222 0.61111111 0.83333333 0.83333333] mean value: 0.8066666666666666 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.66666667 0.8 0.6 0.88888889 0.75 0.6 0.58333333 0.46153846 0.66666667 0.66666667] mean value: 0.6683760683760683 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.34 Accuracy on Blind test: 0.7 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.0178659 0.01578808 0.01413012 0.0122447 0.01217771 0.01225829 0.01304674 0.01155305 0.01350141 0.01376867] mean value: 0.013633465766906739 key: score_time value: [0.01277757 0.01019597 0.00928235 0.00891542 0.00881386 0.00882745 0.00914836 0.00933933 0.00935149 0.00943089] mean value: 0.009608268737792969 key: test_mcc value: [0.89893315 1. 0.89442719 0.89442719 0.89442719 0.77777778 0.77777778 0.47140452 0.79772404 1. ] mean value: 0.840689883451479 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94736842 1. 0.94444444 0.94444444 0.94444444 0.88888889 0.88888889 0.72222222 0.88888889 1. ] mean value: 0.9169590643274853 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94117647 1. 0.94736842 0.94117647 0.94736842 0.88888889 0.88888889 0.66666667 0.875 1. ] mean value: 0.9096534227726178 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.9 1. 0.9 0.88888889 0.88888889 0.83333333 1. 1. ] mean value: 0.9411111111111111 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88888889 1. 1. 0.88888889 1. 0.88888889 0.88888889 0.55555556 0.77777778 1. ] mean value: 0.8888888888888888 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94444444 1. 0.94444444 0.94444444 0.94444444 0.88888889 0.88888889 0.72222222 0.88888889 1. ] mean value: 0.9166666666666666 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88888889 1. 0.9 0.88888889 0.9 0.8 0.8 0.5 0.77777778 1. ] mean value: 0.8455555555555556 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.72 Accuracy on Blind test: 0.86 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.09470534 0.09246397 0.09804273 0.09987426 0.09740973 0.09735131 0.09562063 0.0994997 0.09801269 0.10053802] mean value: 0.09735183715820313 key: score_time value: [0.017555 0.01803017 0.0188818 0.01894879 0.01877737 0.01888657 0.01826334 0.01904607 0.01873159 0.01754045] mean value: 0.01846611499786377 key: test_mcc value: [0.80507649 0.9 0.67082039 1. 0.47140452 0.79772404 0.4472136 0.4472136 1. 0.77777778] mean value: 0.7317230403935541 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.89473684 0.94736842 0.83333333 1. 0.72222222 0.88888889 0.72222222 0.72222222 1. 0.88888889] mean value: 0.8619883040935672 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.875 0.94736842 0.82352941 1. 0.76190476 0.875 0.70588235 0.70588235 1. 0.88888889] mean value: 0.8583456189493341 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.875 1. 0.66666667 1. 0.75 0.75 1. 0.88888889] mean value: 0.8930555555555555 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.77777778 0.9 0.77777778 1. 0.88888889 0.77777778 0.66666667 0.66666667 1. 0.88888889] mean value: 0.8344444444444444 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.88888889 0.95 0.83333333 1. 0.72222222 0.88888889 0.72222222 0.72222222 1. 0.88888889] mean value: 0.8616666666666666 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.77777778 0.9 0.7 1. 0.61538462 0.77777778 0.54545455 0.54545455 1. 0.8 ] mean value: 0.7661849261849262 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.4 Accuracy on Blind test: 0.73 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.01011419 0.01173449 0.01065707 0.01012039 0.00895905 0.01001811 0.01012397 0.01015162 0.01015472 0.00998425] mean value: 0.010201787948608399 key: score_time value: [0.0099287 0.01066327 0.00916076 0.00908279 0.00873351 0.00936961 0.00939894 0.00948119 0.00942492 0.00959873] mean value: 0.009484243392944337 key: test_mcc value: [ 0.26257545 0.59554321 0.4472136 0.62017367 0.34188173 0.47140452 0.2236068 -0.34188173 0.4472136 0.62017367] mean value: 0.368790452109 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.63157895 0.78947368 0.72222222 0.77777778 0.66666667 0.72222222 0.61111111 0.33333333 0.72222222 0.77777778] mean value: 0.6754385964912281 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.53333333 0.77777778 0.70588235 0.71428571 0.7 0.66666667 0.63157895 0.25 0.73684211 0.71428571] mean value: 0.6430652611921962 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.66666667 0.875 0.75 1. 0.63636364 0.83333333 0.6 0.28571429 0.7 1. ] mean value: 0.7347077922077923 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.44444444 0.7 0.66666667 0.55555556 0.77777778 0.55555556 0.66666667 0.22222222 0.77777778 0.55555556] mean value: 0.5922222222222222 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.62222222 0.79444444 0.72222222 0.77777778 0.66666667 0.72222222 0.61111111 0.33333333 0.72222222 0.77777778] mean value: 0.675 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.36363636 0.63636364 0.54545455 0.55555556 0.53846154 0.5 0.46153846 0.14285714 0.58333333 0.55555556] mean value: 0.4882756132756133 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.59 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.22927833 1.1960187 1.19522476 1.17464781 1.17565775 1.17176104 1.17601395 1.23562598 1.2031343 1.24709344] mean value: 1.2004456043243408 key: score_time value: [0.09242558 0.09341002 0.08967948 0.09073281 0.08897328 0.08774757 0.08941007 0.15643597 0.09522557 0.0925622 ] mean value: 0.0976602554321289 key: test_mcc value: [0.89893315 0.9 0.55555556 0.89442719 0.56980288 0.67082039 0.79772404 0.4472136 1. 0.89442719] mean value: 0.7628903993771927 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.94736842 0.94736842 0.77777778 0.94444444 0.77777778 0.83333333 0.88888889 0.72222222 1. 0.94444444] mean value: 0.8783625730994152 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94117647 0.94736842 0.77777778 0.94117647 0.8 0.82352941 0.875 0.70588235 1. 0.94117647] mean value: 0.8753087375300997 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.77777778 1. 0.72727273 0.875 1. 0.75 1. 1. ] mean value: 0.9130050505050505 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.88888889 0.9 0.77777778 0.88888889 0.88888889 0.77777778 0.77777778 0.66666667 1. 0.88888889] mean value: 0.8455555555555555 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.94444444 0.95 0.77777778 0.94444444 0.77777778 0.83333333 0.88888889 0.72222222 1. 0.94444444] mean value: 0.8783333333333333 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( value: [0.88888889 0.9 0.63636364 0.88888889 0.66666667 0.7 0.77777778 0.54545455 1. 0.88888889] mean value: 0.7892929292929293 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.59 Accuracy on Blind test: 0.81 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z...05', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.8364284 0.92184711 0.85616851 0.87351251 0.90050054 0.85522032 0.92096114 0.90089631 0.93183279 0.90460157] mean value: 0.8901969194412231 key: score_time value: [0.25036001 0.22233701 0.20914292 0.19844246 0.23667526 0.23699355 0.20669532 0.17988372 0.18153071 0.18519092] mean value: 0.21072518825531006 key: test_mcc value: [0.78888889 0.78888889 0.33333333 0.89442719 0.67082039 0.67082039 0.77777778 0.4472136 0.67082039 0.79772404] mean value: 0.6840714890356039 key: train_mcc value: [0.93927103 0.95121218 0.92682927 0.96348628 0.97590007 0.93909422 0.96348628 0.96348628 0.92793395 0.95150257] mean value: 0.9502202141599985 key: test_accuracy value: [0.89473684 0.89473684 0.66666667 0.94444444 0.83333333 0.83333333 0.88888889 0.72222222 0.83333333 0.88888889] mean value: 0.8400584795321637 key: train_accuracy value: [0.96932515 0.97546012 0.96341463 0.98170732 0.98780488 0.9695122 0.98170732 0.98170732 0.96341463 0.97560976] mean value: 0.9749663324854108 key: test_fscore value: [0.88888889 0.9 0.66666667 0.94117647 0.84210526 0.82352941 0.88888889 0.70588235 0.84210526 0.875 ] mean value: 0.8374243206054351 key: train_fscore value: [0.97005988 0.97560976 0.96341463 0.98181818 0.98795181 0.96969697 0.98181818 0.98181818 0.96428571 0.97590361] mean value: 0.97523769216074 key: test_precision value: [0.88888889 0.9 0.66666667 1. 0.8 0.875 0.88888889 0.75 0.8 1. ] mean value: 0.8569444444444444 key: train_precision value: [0.95294118 0.96385542 0.96341463 0.97590361 0.97619048 0.96385542 0.97590361 0.97590361 0.94186047 0.96428571] mean value: 0.9654114152956387 key: test_recall value: [0.88888889 0.9 0.66666667 0.88888889 0.88888889 0.77777778 0.88888889 0.66666667 0.88888889 0.77777778] mean value: 0.8233333333333333 key: train_recall value: [0.98780488 0.98765432 0.96341463 0.98780488 1. 0.97560976 0.98780488 0.98780488 0.98780488 0.98780488] mean value: 0.985350797952424 key: test_roc_auc value: [0.89444444 0.89444444 0.66666667 0.94444444 0.83333333 0.83333333 0.88888889 0.72222222 0.83333333 0.88888889] mean value: 0.84 key: train_roc_auc value: [0.96921108 0.97553448 0.96341463 0.98170732 0.98780488 0.9695122 0.98170732 0.98170732 0.96341463 0.97560976] mean value: 0.9749623607347184 key: test_jcc value: [0.8 0.81818182 0.5 0.88888889 0.72727273 0.7 0.8 0.54545455 0.72727273 0.77777778] mean value: 0.7284848484848485 key: train_jcc value: [0.94186047 0.95238095 0.92941176 0.96428571 0.97619048 0.94117647 0.96428571 0.96428571 0.93103448 0.95294118] mean value: 0.9517852931068177 MCC on Blind test: 0.65 Accuracy on Blind test: 0.84 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01126671 0.00995231 0.01127005 0.01492596 0.01139402 0.01019573 0.01002526 0.00952506 0.01013112 0.01003289] mean value: 0.01087191104888916 key: score_time value: [0.01013088 0.00956964 0.01013112 0.01141644 0.00992489 0.00955415 0.00914097 0.00919056 0.00934291 0.00942731] mean value: 0.009782886505126953 key: test_mcc value: [ 0.15555556 0.04494666 0.33333333 -0.11396058 0. 0.11396058 0.34188173 -0.2236068 0.33333333 0.4472136 ] mean value: 0.14326574068486644 key: train_mcc value: [0.44782413 0.41266129 0.42711521 0.45125307 0.48838629 0.43915503 0.49147319 0.48838629 0.47564513 0.40270863] mean value: 0.452460824775168 key: test_accuracy value: [0.57894737 0.52631579 0.66666667 0.44444444 0.5 0.55555556 0.66666667 0.38888889 0.66666667 0.72222222] mean value: 0.5716374269005848 key: train_accuracy value: [0.72392638 0.70552147 0.71341463 0.72560976 0.74390244 0.7195122 0.74390244 0.74390244 0.73780488 0.70121951] mean value: 0.7258716145443663 key: test_fscore value: [0.55555556 0.57142857 0.66666667 0.5 0.47058824 0.5 0.625 0.42105263 0.66666667 0.73684211] mean value: 0.5713800432453683 key: train_fscore value: [0.72727273 0.68831169 0.70807453 0.72727273 0.75 0.71604938 0.75862069 0.7375 0.73939394 0.70658683] mean value: 0.72590825151311 key: test_precision value: [0.55555556 0.54545455 0.66666667 0.45454545 0.5 0.57142857 0.71428571 0.4 0.66666667 0.7 ] mean value: 0.5774603174603175 key: train_precision value: [0.72289157 0.7260274 0.72151899 0.72289157 0.73255814 0.725 0.7173913 0.75641026 0.73493976 0.69411765] mean value: 0.7253746623520101 key: test_recall value: [0.55555556 0.6 0.66666667 0.55555556 0.44444444 0.44444444 0.55555556 0.44444444 0.66666667 0.77777778] mean value: 0.5711111111111111 key: train_recall value: [0.73170732 0.65432099 0.69512195 0.73170732 0.76829268 0.70731707 0.80487805 0.7195122 0.74390244 0.7195122 ] mean value: 0.7276272207166516 key: test_roc_auc value: [0.57777778 0.52222222 0.66666667 0.44444444 0.5 0.55555556 0.66666667 0.38888889 0.66666667 0.72222222] mean value: 0.5711111111111111 key: train_roc_auc value: [0.72387835 0.70520927 0.71341463 0.72560976 0.74390244 0.7195122 0.74390244 0.74390244 0.73780488 0.70121951] mean value: 0.7258355916892503 key: test_jcc value: [0.38461538 0.4 0.5 0.33333333 0.30769231 0.33333333 0.45454545 0.26666667 0.5 0.58333333] mean value: 0.40635198135198136 key: train_jcc value: [0.57142857 0.52475248 0.54807692 0.57142857 0.6 0.55769231 0.61111111 0.58415842 0.58653846 0.5462963 ] mean value: 0.5701483133661351 MCC on Blind test: 0.38 Accuracy on Blind test: 0.7 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'Z... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.12701035 0.04905677 0.05640721 0.05744123 0.0649879 0.05551696 0.09176779 0.0588882 0.04906058 0.05181313] mean value: 0.06619501113891602 key: score_time value: [0.01352024 0.01060653 0.01075244 0.01053977 0.0114274 0.01368093 0.01104403 0.01173997 0.01017952 0.01028299] mean value: 0.011377382278442382 key: test_mcc value: [1. 1. 0.89442719 0.89442719 1. 0.77777778 0.77777778 0.67082039 1. 1. ] mean value: 0.9015230330805324 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 1. 0.94444444 0.94444444 1. 0.88888889 0.88888889 0.83333333 1. 1. ] mean value: 0.95 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 1. 0.94736842 0.94117647 1. 0.88888889 0.88888889 0.82352941 1. 1. ] mean value: 0.9489852081183351 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.9 1. 1. 0.88888889 0.88888889 0.875 1. 1. ] mean value: 0.9552777777777778 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 0.88888889 1. 0.88888889 0.88888889 0.77777778 1. 1. ] mean value: 0.9444444444444444 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 1. 0.94444444 0.94444444 1. 0.88888889 0.88888889 0.83333333 1. 1. ] mean value: 0.95 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 1. 0.9 0.88888889 1. 0.8 0.8 0.7 1. 1. ] mean value: 0.9088888888888889 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.83 Accuracy on Blind test: 0.92 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.02910137 0.04690957 0.02439976 0.02427816 0.05466056 0.05417943 0.02336264 0.02361846 0.02964282 0.03530502] mean value: 0.03454577922821045 key: score_time value: [0.01998067 0.01176548 0.01173735 0.01181555 0.02185011 0.02338123 0.0118885 0.01190233 0.02033043 0.01176739] mean value: 0.0156419038772583 key: test_mcc value: [ 0.80903983 0.68888889 0.89442719 0.67082039 0.56980288 0.70710678 0.56980288 -0.12403473 0. 0.70710678] mean value: 0.5492960900474898 key: train_mcc value: [1. 0.98780488 0.97590007 0.98787834 0.98787834 0.98787834 1. 1. 0.98787834 0.97560976] mean value: 0.9890828066723727 key: test_accuracy value: [0.89473684 0.84210526 0.94444444 0.83333333 0.77777778 0.83333333 0.77777778 0.44444444 0.5 0.83333333] mean value: 0.7681286549707602 key: train_accuracy value: [1. 0.99386503 0.98780488 0.99390244 0.99390244 0.99390244 1. 1. 0.99390244 0.98780488] mean value: 0.9945084542869969 key: test_fscore value: [0.9 0.84210526 0.94117647 0.84210526 0.75 0.8 0.75 0.28571429 0.52631579 0.8 ] mean value: 0.7437417072091995 key: train_fscore value: [1. 0.99386503 0.98765432 0.99393939 0.99386503 0.99386503 1. 1. 0.99393939 0.98780488] mean value: 0.9944933078939763 key: test_precision value: [0.81818182 0.88888889 1. 0.8 0.85714286 1. 0.85714286 0.4 0.5 1. ] mean value: 0.8121356421356422 key: train_precision value: [1. 0.98780488 1. 0.98795181 1. 1. 1. 1. 0.98795181 0.98780488] mean value: 0.9951513370555393 key: test_recall value: [1. 0.8 0.88888889 0.88888889 0.66666667 0.66666667 0.66666667 0.22222222 0.55555556 0.66666667] mean value: 0.7022222222222222 key: train_recall value: [1. 1. 0.97560976 1. 0.98780488 0.98780488 1. 1. 1. 0.98780488] mean value: 0.9939024390243902 key: test_roc_auc value: [0.9 0.84444444 0.94444444 0.83333333 0.77777778 0.83333333 0.77777778 0.44444444 0.5 0.83333333] mean value: 0.7688888888888888 key: train_roc_auc value: [1. 0.99390244 0.98780488 0.99390244 0.99390244 0.99390244 1. 1. 0.99390244 0.98780488] mean value: 0.9945121951219512 key: test_jcc value: [0.81818182 0.72727273 0.88888889 0.72727273 0.6 0.66666667 0.6 0.16666667 0.35714286 0.66666667] mean value: 0.6218759018759019 key: train_jcc value: [1. 0.98780488 0.97560976 0.98795181 0.98780488 0.98780488 1. 1. 0.98795181 0.97590361] mean value: 0.9890831619159565 MCC on Blind test: 0.27 Accuracy on Blind test: 0.62 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02055335 0.00936055 0.00876904 0.00855136 0.0086956 0.00913787 0.01055503 0.01051044 0.00956655 0.0091722 ] mean value: 0.010487198829650879 key: score_time value: [0.00954032 0.00869036 0.00835061 0.00829196 0.00838614 0.00836682 0.01073599 0.00942254 0.00935078 0.00847292] mean value: 0.008960843086242676 key: test_mcc value: [0.15555556 0.36666667 0.47140452 0.23570226 0.56980288 0.12403473 0.2236068 0.11111111 0.4472136 0.55555556] mean value: 0.32606536802127717 key: train_mcc value: [0.46648209 0.43801421 0.50303909 0.45533504 0.48170179 0.46563593 0.46396698 0.50858153 0.45287265 0.45287265] mean value: 0.46885019391864274 key: test_accuracy value: [0.57894737 0.68421053 0.72222222 0.61111111 0.77777778 0.55555556 0.61111111 0.55555556 0.72222222 0.77777778] mean value: 0.6596491228070176 key: train_accuracy value: [0.73006135 0.71779141 0.75 0.72560976 0.73780488 0.73170732 0.73170732 0.75 0.72560976 0.72560976] mean value: 0.7325901541224001 key: test_fscore value: [0.55555556 0.7 0.76190476 0.66666667 0.8 0.63636364 0.63157895 0.55555556 0.70588235 0.77777778] mean value: 0.6791285254133551 key: train_fscore value: [0.75280899 0.72941176 0.76300578 0.74285714 0.75706215 0.74418605 0.73809524 0.77094972 0.73684211 0.73684211] mean value: 0.7472061039370119 key: test_precision value: [0.55555556 0.7 0.66666667 0.58333333 0.72727273 0.53846154 0.6 0.55555556 0.75 0.77777778] mean value: 0.6454623154623155 key: train_precision value: [0.69791667 0.69662921 0.72527473 0.69892473 0.70526316 0.71111111 0.72093023 0.71134021 0.70786517 0.70786517] mean value: 0.7083120381435539 key: test_recall value: [0.55555556 0.7 0.88888889 0.77777778 0.88888889 0.77777778 0.66666667 0.55555556 0.66666667 0.77777778] mean value: 0.7255555555555555 key: train_recall value: [0.81707317 0.7654321 0.80487805 0.79268293 0.81707317 0.7804878 0.75609756 0.84146341 0.76829268 0.76829268] mean value: 0.7911773562180067 key: test_roc_auc value: [0.57777778 0.68333333 0.72222222 0.61111111 0.77777778 0.55555556 0.61111111 0.55555556 0.72222222 0.77777778] mean value: 0.6594444444444445 key: train_roc_auc value: [0.72952424 0.7180819 0.75 0.72560976 0.73780488 0.73170732 0.73170732 0.75 0.72560976 0.72560976] mean value: 0.7325654923215899 key: test_jcc value: [0.38461538 0.53846154 0.61538462 0.5 0.66666667 0.46666667 0.46153846 0.38461538 0.54545455 0.63636364] mean value: 0.51997668997669 key: train_jcc value: [0.6036036 0.57407407 0.61682243 0.59090909 0.60909091 0.59259259 0.58490566 0.62727273 0.58333333 0.58333333] mean value: 0.5965937754493564 MCC on Blind test: 0.13 Accuracy on Blind test: 0.59 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01109362 0.01412797 0.01536965 0.01582646 0.01812601 0.01766753 0.01889634 0.0176363 0.0157218 0.02554536] mean value: 0.0170011043548584 key: score_time value: [0.00872016 0.01174641 0.01328182 0.0126729 0.01187205 0.01244974 0.01293111 0.01268005 0.01434588 0.04189372] mean value: 0.015259385108947754 key: test_mcc value: [0.59554321 0.4719399 0.3721042 0.70710678 0.67082039 0.67082039 0.79772404 0.12403473 0.53452248 0.62017367] mean value: 0.5564789813598412 key: train_mcc value: [0.77808895 0.84213003 0.72987004 0.78978629 0.7200823 0.97590007 0.84162541 0.77964295 0.78978629 0.89565496] mean value: 0.8142567289846006 key: test_accuracy value: [0.78947368 0.73684211 0.66666667 0.83333333 0.83333333 0.83333333 0.88888889 0.55555556 0.72222222 0.77777778] mean value: 0.7637426900584795 key: train_accuracy value: [0.87730061 0.9202454 0.84756098 0.88414634 0.84146341 0.98780488 0.91463415 0.87804878 0.88414634 0.94512195] mean value: 0.8980472841538232 key: test_fscore value: [0.8 0.76190476 0.57142857 0.8 0.82352941 0.82352941 0.9 0.63636364 0.61538462 0.71428571] mean value: 0.7446426122896711 key: train_fscore value: [0.89130435 0.92215569 0.82014388 0.86896552 0.8115942 0.98765432 0.92134831 0.89130435 0.86896552 0.94193548] mean value: 0.8925371626013687 key: test_precision value: [0.72727273 0.72727273 0.8 1. 0.875 0.875 0.81818182 0.53846154 1. 1. ] mean value: 0.8361188811188811 key: train_precision value: [0.80392157 0.89534884 1. 1. 1. 1. 0.85416667 0.80392157 1. 1. ] mean value: 0.9357358641130871 key: test_recall value: [0.88888889 0.8 0.44444444 0.66666667 0.77777778 0.77777778 1. 0.77777778 0.44444444 0.55555556] mean value: 0.7133333333333334 key: train_recall value: [1. 0.95061728 0.69512195 0.76829268 0.68292683 0.97560976 1. 1. 0.76829268 0.8902439 ] mean value: 0.8731105088828666 key: test_roc_auc value: [0.79444444 0.73333333 0.66666667 0.83333333 0.83333333 0.83333333 0.88888889 0.55555556 0.72222222 0.77777778] mean value: 0.7638888888888888 key: train_roc_auc value: [0.87654321 0.92043059 0.84756098 0.88414634 0.84146341 0.98780488 0.91463415 0.87804878 0.88414634 0.94512195] mean value: 0.8979900632339657 key: test_jcc value: [0.66666667 0.61538462 0.4 0.66666667 0.7 0.7 0.81818182 0.46666667 0.44444444 0.55555556] mean value: 0.6033566433566433 key: train_jcc value: [0.80392157 0.85555556 0.69512195 0.76829268 0.68292683 0.97560976 0.85416667 0.80392157 0.76829268 0.8902439 ] mean value: 0.8098053164355173 MCC on Blind test: 0.16 Accuracy on Blind test: 0.57 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01493216 0.01496792 0.01336694 0.01421976 0.01367188 0.01381302 0.01411867 0.01407051 0.01452351 0.01461649] mean value: 0.014230084419250489 key: score_time value: [0.00984478 0.01131344 0.01132298 0.01149678 0.01126051 0.01136065 0.01129603 0.01129508 0.01159763 0.01174164] mean value: 0.011252951622009278 key: test_mcc value: [0.68888889 0.80903983 0.56980288 0.55555556 0.56980288 0.34188173 0.67082039 0. 0.79772404 0.62017367] mean value: 0.5623689874789073 key: train_mcc value: [0.87291501 0.8299473 0.86007808 0.91524688 0.82065181 0.91798509 0.97590007 0.85224163 0.93909422 0.86294893] mean value: 0.8847009016543498 key: test_accuracy value: [0.84210526 0.89473684 0.77777778 0.77777778 0.77777778 0.66666667 0.83333333 0.5 0.88888889 0.77777778] mean value: 0.7736842105263158 key: train_accuracy value: [0.93251534 0.90797546 0.92682927 0.95731707 0.90243902 0.95731707 0.98780488 0.92073171 0.9695122 0.92682927] mean value: 0.9389271285350891 key: test_fscore value: [0.84210526 0.88888889 0.75 0.77777778 0.75 0.625 0.82352941 0.57142857 0.875 0.71428571] mean value: 0.7618015627303554 key: train_fscore value: [0.93714286 0.89795918 0.92207792 0.95808383 0.89189189 0.95541401 0.98765432 0.92655367 0.96969697 0.92105263] mean value: 0.9367527294440279 key: test_precision value: [0.8 1. 0.85714286 0.77777778 0.85714286 0.71428571 0.875 0.5 1. 1. ] mean value: 0.8381349206349207 key: train_precision value: [0.88172043 1. 0.98611111 0.94117647 1. 1. 1. 0.86315789 0.96385542 1. ] mean value: 0.9636021328230462 key: test_recall value: [0.88888889 0.8 0.66666667 0.77777778 0.66666667 0.55555556 0.77777778 0.66666667 0.77777778 0.55555556] mean value: 0.7133333333333334 key: train_recall value: [1. 0.81481481 0.86585366 0.97560976 0.80487805 0.91463415 0.97560976 1. 0.97560976 0.85365854] mean value: 0.91806684733514 key: test_roc_auc value: [0.84444444 0.9 0.77777778 0.77777778 0.77777778 0.66666667 0.83333333 0.5 0.88888889 0.77777778] mean value: 0.7744444444444444 key: train_roc_auc value: [0.93209877 0.90740741 0.92682927 0.95731707 0.90243902 0.95731707 0.98780488 0.92073171 0.9695122 0.92682927] mean value: 0.9388286660644384 key: test_jcc value: [0.72727273 0.8 0.6 0.63636364 0.6 0.45454545 0.7 0.4 0.77777778 0.55555556] mean value: 0.6251515151515151 key: train_jcc value: [0.88172043 0.81481481 0.85542169 0.91954023 0.80487805 0.91463415 0.97560976 0.86315789 0.94117647 0.85365854] mean value: 0.8824612014684342 MCC on Blind test: 0.48 Accuracy on Blind test: 0.76 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.11141515 0.1019187 0.10138392 0.10054922 0.10579777 0.10397267 0.10807872 0.1059761 0.10747981 0.10852766] mean value: 0.10550997257232667 key: score_time value: [0.01460505 0.01543188 0.01478577 0.01445913 0.01539803 0.01554084 0.01605749 0.01555371 0.01592231 0.01543474] mean value: 0.015318894386291504 key: test_mcc value: [1. 0.9 0.89442719 1. 0.89442719 0.67082039 0.89442719 0.77777778 0.89442719 0.89442719] mean value: 0.8820734126027294 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.94736842 0.94444444 1. 0.94444444 0.83333333 0.94444444 0.88888889 0.94444444 0.94444444] mean value: 0.9391812865497076 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.94736842 0.94736842 1. 0.94736842 0.82352941 0.94736842 0.88888889 0.94117647 0.94117647] mean value: 0.9384244926040592 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.9 1. 0.9 0.875 0.9 0.88888889 1. 1. ] mean value: 0.946388888888889 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.9 1. 1. 1. 0.77777778 1. 0.88888889 0.88888889 0.88888889] mean value: 0.9344444444444444 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.95 0.94444444 1. 0.94444444 0.83333333 0.94444444 0.88888889 0.94444444 0.94444444] mean value: 0.9394444444444444 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.9 0.9 1. 0.9 0.7 0.9 0.8 0.88888889 0.88888889] mean value: 0.8877777777777778 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.0429635 0.03572798 0.04963231 0.03834414 0.04578018 0.05369258 0.03833675 0.03687501 0.03249407 0.03297234] mean value: 0.040681886672973636 key: score_time value: [0.0182848 0.02209377 0.02278614 0.02101231 0.02734518 0.02648211 0.02281046 0.02151656 0.01736498 0.02486873] mean value: 0.02245650291442871 key: test_mcc value: [0.89893315 1. 0.89442719 0.89442719 1. 0.89442719 0.77777778 0.56980288 0.89442719 1. ] mean value: 0.882422257402662 key: train_mcc value: [0.98780488 0.98780305 0.98787834 0.97560976 0.98787834 0.98787834 0.98787834 1. 0.96348628 1. ] mean value: 0.9866217328816356 key: test_accuracy value: [0.94736842 1. 0.94444444 0.94444444 1. 0.94444444 0.88888889 0.77777778 0.94444444 1. ] mean value: 0.9391812865497076 key: train_accuracy value: [0.99386503 0.99386503 0.99390244 0.98780488 0.99390244 0.99390244 0.99390244 1. 0.98170732 1. ] mean value: 0.9932852012569205 key: test_fscore value: [0.94117647 1. 0.94117647 0.94117647 1. 0.94736842 0.88888889 0.75 0.94736842 1. ] mean value: 0.9357155142758858 key: train_fscore value: [0.99386503 0.99378882 0.99386503 0.98780488 0.99386503 0.99393939 0.99386503 1. 0.98181818 1. ] mean value: 0.9932811396381519 key: test_precision value: [1. 1. 1. 1. 1. 0.9 0.88888889 0.85714286 0.9 1. ] mean value: 0.9546031746031746 key: train_precision value: [1. 1. 1. 0.98780488 1. 0.98795181 1. 1. 0.97590361 1. ] mean value: 0.9951660299735527 key: test_recall value: [0.88888889 1. 0.88888889 0.88888889 1. 1. 0.88888889 0.66666667 1. 1. ] mean value: 0.9222222222222222 key: train_recall value: [0.98780488 0.98765432 0.98780488 0.98780488 0.98780488 1. 0.98780488 1. 0.98780488 1. ] mean value: 0.9914483589280337 key: test_roc_auc value: [0.94444444 1. 0.94444444 0.94444444 1. 0.94444444 0.88888889 0.77777778 0.94444444 1. ] mean value: 0.9388888888888889 key: train_roc_auc value: [0.99390244 0.99382716 0.99390244 0.98780488 0.99390244 0.99390244 0.99390244 1. 0.98170732 1. ] mean value: 0.9932851550737729 key: test_jcc value: [0.88888889 1. 0.88888889 0.88888889 1. 0.9 0.8 0.6 0.9 1. ] mean value: 0.8866666666666667 key: train_jcc value: [0.98780488 0.98765432 0.98780488 0.97590361 0.98780488 0.98795181 0.98780488 1. 0.96428571 1. ] mean value: 0.9867014969155238 MCC on Blind test: 0.73 Accuracy on Blind test: 0.86 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.04006433 0.06962156 0.07541966 0.06054902 0.05782652 0.06331706 0.05930591 0.05185676 0.05739498 0.05739808] mean value: 0.05927538871765137 key: score_time value: [0.0228579 0.0328567 0.01235533 0.02372789 0.02360535 0.02234149 0.02110028 0.0226531 0.02321887 0.02362919] mean value: 0.02283461093902588 key: test_mcc value: [ 0.71611487 0.72456884 0.56980288 0.89442719 0.34188173 0.4472136 0.55555556 -0.11396058 0.77777778 0.70710678] mean value: 0.5620488647586125 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.84210526 0.84210526 0.77777778 0.94444444 0.66666667 0.66666667 0.77777778 0.44444444 0.88888889 0.83333333] mean value: 0.7684210526315789 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8 0.82352941 0.75 0.94117647 0.7 0.5 0.77777778 0.5 0.88888889 0.8 ] mean value: 0.7481372549019608 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.85714286 1. 0.63636364 1. 0.77777778 0.45454545 0.88888889 1. ] mean value: 0.8614718614718615 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.66666667 0.7 0.66666667 0.88888889 0.77777778 0.33333333 0.77777778 0.55555556 0.88888889 0.66666667] mean value: 0.6922222222222222 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.83333333 0.85 0.77777778 0.94444444 0.66666667 0.66666667 0.77777778 0.44444444 0.88888889 0.83333333] mean value: 0.7683333333333333 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.66666667 0.7 0.6 0.88888889 0.53846154 0.33333333 0.63636364 0.33333333 0.8 0.66666667] mean value: 0.6163714063714063 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.57 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.30276346 0.29489088 0.294976 0.30200529 0.29472876 0.29524827 0.29835391 0.29391432 0.29789948 0.30171204] mean value: 0.2976492404937744 key: score_time value: [0.00935555 0.00900364 0.00975752 0.00927353 0.00938869 0.00927615 0.00972366 0.00915432 0.0101068 0.00940204] mean value: 0.009444189071655274 key: test_mcc value: [1. 1. 0.89442719 0.89442719 0.89442719 0.89442719 0.77777778 0.77777778 1. 1. ] mean value: 0.9133264319555219 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 1. 0.94444444 0.94444444 0.94444444 0.94444444 0.88888889 0.88888889 1. 1. ] mean value: 0.9555555555555555 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 1. 0.94736842 0.94117647 0.94736842 0.94736842 0.88888889 0.88888889 1. 1. ] mean value: 0.9561059511523908 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.9 1. 0.9 0.9 0.88888889 0.88888889 1. 1. ] mean value: 0.9477777777777778 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 0.88888889 1. 1. 0.88888889 0.88888889 1. 1. ] mean value: 0.9666666666666667 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 1. 0.94444444 0.94444444 0.94444444 0.94444444 0.88888889 0.88888889 1. 1. ] mean value: 0.9555555555555555 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 1. 0.9 0.88888889 0.9 0.9 0.8 0.8 1. 1. ] mean value: 0.9188888888888889 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.77 Accuracy on Blind test: 0.89 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.01833081 0.02146482 0.01923108 0.01972938 0.0196569 0.01911902 0.01890945 0.01898503 0.01887536 0.01911259] mean value: 0.019341444969177245 key: score_time value: [0.01230168 0.01215196 0.01264477 0.01321173 0.01266956 0.01343489 0.01673889 0.01325631 0.01959014 0.01313806] mean value: 0.013913798332214355 key: test_mcc value: [0.72456884 0.89893315 0.79772404 0.79772404 0.53452248 0.70710678 0.53452248 0.35355339 0.53452248 0.79772404] mean value: 0.6680901716167226 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.84210526 0.94736842 0.88888889 0.88888889 0.72222222 0.83333333 0.72222222 0.61111111 0.72222222 0.88888889] mean value: 0.8067251461988304 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 0.95238095 0.9 0.9 0.7826087 0.85714286 0.7826087 0.72 0.7826087 0.9 ] mean value: 0.8434492753623188 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 0.90909091 0.81818182 0.81818182 0.64285714 0.75 0.64285714 0.5625 0.64285714 0.81818182] mean value: 0.7354707792207793 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.85 0.94444444 0.88888889 0.88888889 0.72222222 0.83333333 0.72222222 0.61111111 0.72222222 0.88888889] mean value: 0.8072222222222222 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 0.90909091 0.81818182 0.81818182 0.64285714 0.75 0.64285714 0.5625 0.64285714 0.81818182] mean value: 0.7354707792207793 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.0 Accuracy on Blind test: 0.62 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.04197288 0.03873825 0.03924918 0.03602433 0.03344655 0.03352237 0.0336082 0.03359556 0.03350544 0.03394341] mean value: 0.03576061725616455 key: score_time value: [0.02062082 0.02285862 0.02000999 0.02112794 0.02082849 0.01994658 0.02333951 0.02297258 0.02233124 0.02153254] mean value: 0.021556830406188963 key: test_mcc value: [0.78888889 0.68543653 0.56980288 0.79772404 0.89442719 0.56980288 0.67082039 0.4472136 0.67082039 0.70710678] mean value: 0.6802043569726659 key: train_mcc value: [0.93871406 0.93872328 0.96348628 0.95150257 0.96348628 0.98787834 0.95150257 0.95121951 0.93909422 0.95150257] mean value: 0.9537109689949451 key: test_accuracy value: [0.89473684 0.84210526 0.77777778 0.88888889 0.94444444 0.77777778 0.83333333 0.72222222 0.83333333 0.83333333] mean value: 0.8347953216374269 key: train_accuracy value: [0.96932515 0.96932515 0.98170732 0.97560976 0.98170732 0.99390244 0.97560976 0.97560976 0.9695122 0.97560976] mean value: 0.9767918599431393 key: test_fscore value: [0.88888889 0.85714286 0.75 0.875 0.94736842 0.75 0.82352941 0.70588235 0.82352941 0.8 ] mean value: 0.8221341343554966 key: train_fscore value: [0.96969697 0.96932515 0.98159509 0.97530864 0.98181818 0.99386503 0.97530864 0.97560976 0.96969697 0.97530864] mean value: 0.9767533079309227 key: test_precision value: [0.88888889 0.81818182 0.85714286 1. 0.9 0.85714286 0.875 0.75 0.875 1. ] mean value: 0.8821356421356421 key: train_precision value: [0.96385542 0.96341463 0.98765432 0.9875 0.97590361 1. 0.9875 0.97560976 0.96385542 0.9875 ] mean value: 0.9792793169062882 key: test_recall value: [0.88888889 0.9 0.66666667 0.77777778 1. 0.66666667 0.77777778 0.66666667 0.77777778 0.66666667] mean value: 0.7788888888888889 key: train_recall value: [0.97560976 0.97530864 0.97560976 0.96341463 0.98780488 0.98780488 0.96341463 0.97560976 0.97560976 0.96341463] mean value: /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:188: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./pnca_8020.py:191: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) 0.9743601324902138 key: test_roc_auc value: [0.89444444 0.83888889 0.77777778 0.88888889 0.94444444 0.77777778 0.83333333 0.72222222 0.83333333 0.83333333] mean value: 0.8344444444444443 key: train_roc_auc value: [0.96928636 0.96936164 0.98170732 0.97560976 0.98170732 0.99390244 0.97560976 0.97560976 0.9695122 0.97560976] mean value: 0.9767916290274014 key: test_jcc value: [0.8 0.75 0.6 0.77777778 0.9 0.6 0.7 0.54545455 0.7 0.66666667] mean value: 0.7039898989898989 key: train_jcc value: [0.94117647 0.94047619 0.96385542 0.95180723 0.96428571 0.98780488 0.95180723 0.95238095 0.94117647 0.95180723] mean value: 0.9546577784801843 MCC on Blind test: 0.54 Accuracy on Blind test: 0.78 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'rsa', 'kd_values', ... 'VENM980101', 'VOGG950101', 'WEIL970101', 'WEIL970102', 'ZHAC000101', 'ZHAC000102', 'ZHAC000103', 'ZHAC000104', 'ZHAC000105', 'ZHAC000106'], dtype='object', length=166)), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.23004889 0.22601652 0.21089005 0.23468184 0.22533727 0.1308012 0.14139199 0.2118597 0.13306785 0.30921268] mean value: 0.20533080101013185 key: score_time value: [0.02214789 0.02047777 0.0222075 0.02512264 0.02310586 0.01235127 0.01761007 0.01181221 0.02391386 0.02258921] mean value: 0.020133829116821288 key: test_mcc value: [0.78888889 0.68888889 0.89442719 0.79772404 0.89442719 0.56980288 0.67082039 0.4472136 0.67082039 0.70710678] mean value: 0.7130120240479644 key: train_mcc value: [0.93871406 0.96326408 0.97590007 0.95150257 0.96348628 0.98787834 0.95150257 0.95121951 0.93909422 0.97560976] mean value: 0.9598171466702564 key: test_accuracy value: [0.89473684 0.84210526 0.94444444 0.88888889 0.94444444 0.77777778 0.83333333 0.72222222 0.83333333 0.83333333] mean value: 0.8514619883040936 key: train_accuracy value: [0.96932515 0.98159509 0.98780488 0.97560976 0.98170732 0.99390244 0.97560976 0.97560976 0.9695122 0.98780488] mean value: 0.9798481221008529 key: test_fscore value: [0.88888889 0.84210526 0.94117647 0.875 0.94736842 0.75 0.82352941 0.70588235 0.82352941 0.8 ] mean value: 0.8397480220158239 key: train_fscore value: [0.96969697 0.98159509 0.98765432 0.97530864 0.98181818 0.99386503 0.97530864 0.97560976 0.96969697 0.98780488] mean value: 0.9798358482996121 key: test_precision value: [0.88888889 0.88888889 1. 1. 0.9 0.85714286 0.875 0.75 0.875 1. ] mean value: 0.9034920634920635 key: train_precision value: [0.96385542 0.97560976 1. 0.9875 0.97590361 1. 0.9875 0.97560976 0.96385542 0.98780488] mean value: 0.9817638848075227 key: test_recall value: [0.88888889 0.8 0.88888889 0.77777778 1. 0.66666667 0.77777778 0.66666667 0.77777778 0.66666667] mean value: 0.7911111111111111 key: train_recall value: [0.97560976 0.98765432 0.97560976 0.96341463 0.98780488 0.98780488 0.96341463 0.97560976 0.97560976 0.98780488] mean value: 0.9780337247816923 key: test_roc_auc value: [0.89444444 0.84444444 0.94444444 0.88888889 0.94444444 0.77777778 0.83333333 0.72222222 0.83333333 0.83333333] mean value: 0.8516666666666666 key: train_roc_auc value: [0.96928636 0.98163204 0.98780488 0.97560976 0.98170732 0.99390244 0.97560976 0.97560976 0.9695122 0.98780488] mean value: 0.9798479373682626 key: test_jcc value: [0.8 0.72727273 0.88888889 0.77777778 0.9 0.6 0.7 0.54545455 0.7 0.66666667] mean value: 0.7306060606060606 key: train_jcc value: [0.94117647 0.96385542 0.97560976 0.95180723 0.96428571 0.98780488 0.95180723 0.95238095 0.94117647 0.97590361] mean value: 0.9605807735965383 MCC on Blind test: 0.54 Accuracy on Blind test: 0.78