/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data.py:550: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 817 PASS: my_features_df and aa_df successfully combined nrows: 817 ncols: 269 count of NULL values before imputation or_mychisq 244 log10_or_mychisq 244 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML No. of numerical features: 45 No. of categorical features: 7 index: 0 ind: 1 Mask count check: True index: 1 ind: 2 Mask count check: True Original Data Counter({1: 309, 0: 158}) Data dim: (467, 52) ------------------------------------------------------------- Successfully split data: UQ [no aa_index but active site included] training actual values: training set imputed values: blind test set Train data size: (467, 52) Test data size: (350, 52) y_train numbers: Counter({1: 309, 0: 158}) y_train ratio: 0.511326860841424 y_test_numbers: Counter({0: 315, 1: 35}) y_test ratio: 9.0 ------------------------------------------------------------- Simple Random OverSampling Counter({1: 309, 0: 309}) (618, 52) Simple Random UnderSampling Counter({0: 158, 1: 158}) (316, 52) Simple Combined Over and UnderSampling Counter({0: 309, 1: 309}) (618, 52) SMOTE_NC OverSampling Counter({1: 309, 0: 309}) (618, 52) ##################################################################### Running ML analysis: UQ [without AA index but with active site annotations] Gene name: katG Drug name: isoniazid Output directory: /home/tanu/git/Data/isoniazid/output/ml/uq_v1/ Sanity checks: Total input features: 52 Training data size: (467, 52) Test data size: (350, 52) Target feature numbers (training data): Counter({1: 309, 0: 158}) Target features ratio (training data: 0.511326860841424 Target feature numbers (test data): Counter({0: 315, 1: 35}) Target features ratio (test data): 9.0 ##################################################################### ================================================================ Strucutral features (n): 36 These are: Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist'] FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss'] Other struc columns: ['rsa', 'kd_values', 'rd_values'] ================================================================ Evolutionary features (n): 3 These are: ['consurf_score', 'snap2_score', 'provean_score'] ================================================================ Genomic features (n): 6 These are: ['maf', 'logorI'] ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'] ================================================================ Categorical features (n): 7 These are: ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'] ================================================================ Pass: No. of features match ##################################################################### Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.02167606 0.02372026 0.03166604 0.02357769 0.02548194 0.02195692 0.02136278 0.02161574 0.02221417 0.02264333] mean value: 0.023591494560241698 key: score_time value: [0.0109992 0.01075363 0.01093793 0.01066351 0.01062679 0.01058674 0.01058102 0.01062608 0.0105927 0.01066446] mean value: 0.010703206062316895 key: test_mcc value: [0.90662544 0.66402366 0.60908698 0.90662544 0.86070252 0.66337469 0.67402153 0.80215054 0.66040066 0.85943956] mean value: 0.7606451028769974 key: train_mcc value: [0.83338837 0.82273265 0.789683 0.77877628 0.76217448 0.80630977 0.79579908 0.77434754 0.7963019 0.80086095] mean value: 0.7960374023577294 key: test_accuracy value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.95744681 0.85106383 0.82978723 0.95744681 0.93617021 0.85106383 0.85106383 0.91304348 0.84782609 0.93478261] mean value: 0.8929694727104533 key: train_accuracy value: [0.92619048 0.92142857 0.90714286 0.90238095 0.8952381 0.91428571 0.90952381 0.90023753 0.90973872 0.91211401] mean value: 0.9098280737473137 key: test_fscore value: [0.96875 0.88888889 0.87878788 0.96875 0.95384615 0.89552239 0.8852459 0.93548387 0.8852459 0.95238095] mean value: 0.9212901936210006 key: train_fscore value: [0.94532628 0.94240838 0.93169877 0.92869565 0.92334495 0.93728223 0.93425606 0.92682927 0.93379791 0.93542757] mean value: 0.9339067066812484 key: test_precision value: [0.93939394 0.875 0.82857143 0.93939394 0.91176471 0.83333333 0.9 0.93548387 0.9 0.90909091] mean value: 0.8972032126633644 key: train_precision value: [0.92733564 0.91525424 0.90784983 0.8989899 0.89527027 0.90878378 0.9 0.89864865 0.90540541 0.91156463] mean value: 0.9069102339726427 key: test_recall value: [1. 0.90322581 0.93548387 1. 1. 0.96774194 0.87096774 0.93548387 0.87096774 1. ] mean value: 0.9483870967741935 key: train_recall value: [0.96402878 0.97122302 0.95683453 0.96043165 0.95323741 0.9676259 0.97122302 0.95683453 0.96402878 0.96057348] mean value: 0.962604110260179 key: test_roc_auc value: [0.9375 0.8266129 0.78024194 0.9375 0.90625 0.79637097 0.84173387 0.90107527 0.83548387 0.90625 ] mean value: 0.8669018817204301 key: train_roc_auc value: [0.90807073 0.89758334 0.88334684 0.87458202 0.86746378 0.88874253 0.87997771 0.87352216 0.88411229 0.88873744] mean value: 0.8846138841461438 key: test_jcc value: [0.93939394 0.8 0.78378378 0.93939394 0.91176471 0.81081081 0.79411765 0.87878788 0.79411765 0.90909091] mean value: 0.8561261261261262 key: train_jcc value: [0.89632107 0.89108911 0.87213115 0.86688312 0.85760518 0.88196721 0.87662338 0.86363636 0.87581699 0.87868852] mean value: 0.8760762092991343 MCC on Blind test: 0.23 Accuracy on Blind test: 0.45 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.74151611 1.08886981 0.69769788 0.70535755 0.87346554 0.72519445 0.73284912 0.83741045 0.65530038 0.68675303] mean value: 0.7744414329528808 key: score_time value: [0.01378059 0.01389503 0.01416969 0.01405454 0.01443934 0.0140748 0.01437092 0.01120043 0.0144248 0.01425123] mean value: 0.013866138458251954 key: test_mcc value: [1. 0.8566725 1. 0.95299692 0.90662544 0.76032282 0.90524194 0.9085301 0.85513419 0.85513419] mean value: 0.90006580934109 key: train_mcc value: [0.93593571 0.96269263 0.94130059 0.93593571 0.95736701 0.95734993 0.94131391 0.9469026 0.95756757 0.95740101] mean value: 0.9493766673456756 key: test_accuracy value: [1. 0.93617021 1. 0.9787234 0.95744681 0.89361702 0.95744681 0.95652174 0.93478261 0.93478261] mean value: 0.9549491211840888 key: train_accuracy value: [0.97142857 0.98333333 0.97380952 0.97142857 0.98095238 0.98095238 0.97380952 0.97624703 0.98099762 0.98099762] mean value: 0.9773956565999321 key: test_fscore value: [1. 0.95238095 1. 0.98412698 0.96875 0.92307692 0.96774194 0.96666667 0.95081967 0.95081967] mean value: 0.9664382805997692 key: train_fscore value: [0.97857143 0.98747764 0.980322 0.97857143 0.98571429 0.98566308 0.98039216 0.98214286 0.98571429 0.98571429] mean value: 0.983028345294684 key: test_precision value: [1. 0.9375 1. 0.96875 0.93939394 0.88235294 0.96774194 1. 0.96666667 0.93548387] mean value: 0.959788935368869 key: train_precision value: [0.97163121 0.98220641 0.97508897 0.97163121 0.9787234 0.98214286 0.97173145 0.9751773 0.9787234 0.98220641] mean value: 0.9769262610088233 key: test_recall value: [1. 0.96774194 1. 1. 1. 0.96774194 0.96774194 0.93548387 0.93548387 0.96666667] mean value: 0.9740860215053764 key: train_recall value: [0.98561151 0.99280576 0.98561151 0.98561151 0.99280576 0.98920863 0.98920863 0.98920863 0.99280576 0.98924731] mean value: 0.9892125009669683 key: test_roc_auc value: [1. 0.92137097 1. 0.96875 0.9375 0.85887097 0.95262097 0.96774194 0.9344086 0.92083333] mean value: 0.9462096774193549 key: train_roc_auc value: [0.96463674 0.97879724 0.96815787 0.96463674 0.97527612 0.97699868 0.9664353 0.97012879 0.97542386 0.97701802] mean value: 0.9717509367831001 key: test_jcc value: [1. 0.90909091 1. 0.96875 0.93939394 0.85714286 0.9375 0.93548387 0.90625 0.90625 ] mean value: 0.9359861576595447 key: train_jcc value: [0.95804196 0.97526502 0.96140351 0.95804196 0.97183099 0.97173145 0.96153846 0.96491228 0.97183099 0.97183099] mean value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( 0.9666427591273636 MCC on Blind test: 0.14 Accuracy on Blind test: 0.32 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01048064 0.00996375 0.00781631 0.00742173 0.00739574 0.00739932 0.00733399 0.00878787 0.0088315 0.00836849] mean value: 0.008379936218261719 key: score_time value: [0.01066589 0.00898504 0.0085032 0.0080471 0.00794482 0.0084784 0.00797868 0.00964499 0.00877905 0.00852466] mean value: 0.008755183219909668 key: test_mcc value: [0.8566725 0.50614703 0.62096774 0.76032282 0.81048387 0.71572581 0.59764284 0.75776742 0.60430108 0.36514837] mean value: 0.6595179479313003 key: train_mcc value: [0.70671585 0.70811111 0.71695894 0.68716403 0.71727396 0.73126698 0.71138479 0.71852622 0.74194944 0.54109586] mean value: 0.6980447184919443 key: test_accuracy value: [0.93617021 0.76595745 0.82978723 0.89361702 0.91489362 0.87234043 0.80851064 0.89130435 0.82608696 0.67391304] mean value: 0.8412580943570768 key: train_accuracy value: [0.87142857 0.86666667 0.86904762 0.85714286 0.87142857 0.87857143 0.86904762 0.87173397 0.88361045 0.74821853] mean value: 0.8586896278701505 key: test_fscore value: [0.95238095 0.81355932 0.87096774 0.92307692 0.93548387 0.90322581 0.84745763 0.91803279 0.87096774 0.71698113] mean value: 0.8752133904861458 key: train_fscore value: [0.90721649 0.8974359 0.89833641 0.89010989 0.90145985 0.90744102 0.89981785 0.90145985 0.91139241 0.77916667] mean value: 0.8893836343169823 key: test_precision value: [0.9375 0.85714286 0.87096774 0.88235294 0.93548387 0.90322581 0.89285714 0.93333333 0.87096774 0.82608696] mean value: 0.8909918392321865 key: train_precision value: [0.86842105 0.9141791 0.92395437 0.90671642 0.91481481 0.91575092 0.91143911 0.91481481 0.91636364 0.93034826] mean value: 0.9116802502485006 key: test_recall value: [0.96774194 0.77419355 0.87096774 0.96774194 0.93548387 0.90322581 0.80645161 0.90322581 0.87096774 0.63333333] mean value: 0.8633333333333333 key: train_recall value: [0.94964029 0.88129496 0.87410072 0.87410072 0.88848921 0.89928058 0.88848921 0.88848921 0.90647482 0.6702509 ] mean value: 0.8720610608287563 key: test_roc_auc value: [0.92137097 0.76209677 0.81048387 0.85887097 0.90524194 0.8578629 0.80947581 0.88494624 0.80215054 0.69166667] mean value: 0.8304166666666667 key: train_roc_auc value: [0.83397507 0.85966157 0.86662782 0.84902219 0.86325869 0.86865437 0.85973756 0.86382502 0.87281783 0.78582967] mean value: 0.8523409805276452 key: test_jcc value: [0.90909091 0.68571429 0.77142857 0.85714286 0.87878788 0.82352941 0.73529412 0.84848485 0.77142857 0.55882353] mean value: 0.7839724980901451 key: train_jcc value: [0.83018868 0.81395349 0.81543624 0.8019802 0.82059801 0.83056478 0.81788079 0.82059801 0.8372093 0.63822526] mean value: 0.8026634757590374 MCC on Blind test: 0.22 Accuracy on Blind test: 0.56 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00818086 0.0079577 0.00792146 0.00760269 0.00763583 0.00755167 0.00756431 0.00766754 0.00792885 0.00768661] mean value: 0.00776975154876709 key: score_time value: [0.00826931 0.00858402 0.0080297 0.00802231 0.00803876 0.00794721 0.00808096 0.00816536 0.00821137 0.0079844 ] mean value: 0.008133339881896972 key: test_mcc value: [0.76746995 0.61207663 0.31752781 0.71206211 0.76032282 0.6139232 0.66402366 0.59332241 0.38733878 0.70954337] mean value: 0.6137610732708011 key: train_mcc value: [0.62791789 0.64521328 0.66619129 0.63945586 0.63982246 0.63982246 0.6506538 0.65794031 0.65846852 0.63442864] mean value: 0.6459914516114823 key: test_accuracy value: [0.89361702 0.82978723 0.70212766 0.87234043 0.89361702 0.82978723 0.85106383 0.82608696 0.73913043 0.86956522] mean value: 0.8307123034227567 key: train_accuracy value: [0.83809524 0.8452381 0.85238095 0.84285714 0.84285714 0.84285714 0.84761905 0.85035629 0.85035629 0.84085511] mean value: 0.8453472457866757 key: test_fscore value: [0.91803279 0.875 0.78125 0.90909091 0.92307692 0.88235294 0.88888889 0.875 0.8125 0.90625 ] mean value: 0.8771442449118437 key: train_fscore value: [0.88316151 0.88773748 0.89007092 0.8862069 0.88581315 0.88581315 0.88965517 0.89156627 0.89081456 0.88468158] mean value: 0.8875520685563664 key: test_precision value: [0.93333333 0.84848485 0.75757576 0.85714286 0.88235294 0.81081081 0.875 0.84848485 0.78787879 0.85294118] mean value: 0.8454005361358302 key: train_precision value: [0.84539474 0.8538206 0.87762238 0.85099338 0.85333333 0.85333333 0.85430464 0.85478548 0.85953177 0.85099338] mean value: 0.8554113020989377 key: test_recall value: [0.90322581 0.90322581 0.80645161 0.96774194 0.96774194 0.96774194 0.90322581 0.90322581 0.83870968 0.96666667] mean value: 0.9127956989247312 key: train_recall value: [0.92446043 0.92446043 0.9028777 0.92446043 0.92086331 0.92086331 0.92805755 0.93165468 0.92446043 0.92114695] mean value: 0.9223305226786314 key: test_roc_auc value: [0.8891129 0.7953629 0.65322581 0.82762097 0.85887097 0.76512097 0.8266129 0.78494624 0.68602151 0.82708333] mean value: 0.7913978494623656 key: train_roc_auc value: [0.79673726 0.80730064 0.82819941 0.80377951 0.80550208 0.80550208 0.8090992 0.81198118 0.81537707 0.80212277] mean value: 0.8085601200017799 key: test_jcc value: [0.84848485 0.77777778 0.64102564 0.83333333 0.85714286 0.78947368 0.8 0.77777778 0.68421053 0.82857143] mean value: 0.783779787463998 key: train_jcc value: [0.79076923 0.79813665 0.80191693 0.79566563 0.79503106 0.79503106 0.80124224 0.80434783 0.803125 0.79320988] mean value: 0.7978475494770487 MCC on Blind test: 0.24 Accuracy on Blind test: 0.47 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00735164 0.00838447 0.00824022 0.00810122 0.00807238 0.0080514 0.00810838 0.00790358 0.00764203 0.00775075] mean value: 0.00796060562133789 key: score_time value: [0.09431863 0.01160264 0.01148391 0.01503587 0.01451468 0.0130167 0.01415229 0.01103735 0.01092792 0.01100278] mean value: 0.02070927619934082 key: test_mcc value: [0.76746995 0.76034808 0.4031367 0.65994312 0.71025956 0.61207663 0.56769924 0.58251534 0.49033059 0.48102958] mean value: 0.6034808785180602 key: train_mcc value: [0.69858559 0.69632669 0.75172804 0.69676775 0.73520628 0.71297421 0.70164234 0.70915156 0.73690278 0.72050578] mean value: 0.7159791011797761 key: test_accuracy value: [0.89361702 0.89361702 0.74468085 0.85106383 0.87234043 0.82978723 0.80851064 0.80434783 0.7826087 0.76086957] mean value: 0.8241443108233117 key: train_accuracy value: [0.86666667 0.86666667 0.89047619 0.86666667 0.88333333 0.87380952 0.86904762 0.87173397 0.88361045 0.87648456] mean value: 0.8748495645288994 key: test_fscore value: [0.91803279 0.92063492 0.81818182 0.89230769 0.90625 0.875 0.85714286 0.84745763 0.84375 0.81355932] mean value: 0.8692317024305076 key: train_fscore value: [0.90070922 0.9020979 0.91901408 0.90175439 0.91388401 0.90718039 0.90401396 0.90526316 0.91358025 0.90812721] mean value: 0.9075624559641324 key: test_precision value: [0.93333333 0.90625 0.77142857 0.85294118 0.87878788 0.84848485 0.84375 0.89285714 0.81818182 0.82758621] mean value: 0.8573600976440733 key: train_precision value: [0.88811189 0.87755102 0.9 0.88013699 0.89347079 0.88395904 0.8779661 0.88356164 0.89619377 0.89547038] mean value: 0.887642163000012 key: test_recall value: [0.90322581 0.93548387 0.87096774 0.93548387 0.93548387 0.90322581 0.87096774 0.80645161 0.87096774 0.8 ] mean value: 0.8832258064516129 key: train_recall value: [0.91366906 0.92805755 0.93884892 0.92446043 0.9352518 0.93165468 0.93165468 0.92805755 0.93165468 0.92114695] mean value: 0.9284456305923003 key: test_roc_auc value: [0.8891129 0.87399194 0.68548387 0.81149194 0.84274194 0.7953629 0.77923387 0.80322581 0.73548387 0.74375 ] mean value: 0.7959879032258065 key: train_roc_auc value: [0.84415848 0.83726821 0.86731178 0.83899078 0.85847097 0.84610903 0.83906677 0.84514766 0.86093223 0.85493967] mean value: 0.8492395591157109 key: test_jcc value: [0.84848485 0.85294118 0.69230769 0.80555556 0.82857143 0.77777778 0.75 0.73529412 0.72972973 0.68571429] mean value: 0.7706376612258965 key: train_jcc value: [0.81935484 0.82165605 0.85016287 0.82108626 0.84142395 0.83012821 0.82484076 0.82692308 0.84090909 0.83171521] mean value: 0.8308200313963069 MCC on Blind test: 0.2 Accuracy on Blind test: 0.45 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01428485 0.0115571 0.0116024 0.01192141 0.0118351 0.01425028 0.01206684 0.01400876 0.01199913 0.01239347] mean value: 0.012591934204101563 key: score_time value: [0.00854349 0.00847244 0.00854373 0.00855494 0.00946355 0.00848031 0.00859547 0.00851321 0.0085175 0.00893998] mean value: 0.00866246223449707 key: test_mcc value: [0.8566725 0.71206211 0.50611184 0.76032282 0.66337469 0.6139232 0.65994312 0.64852426 0.38733878 0.72168784] mean value: 0.6529961162737778 key: train_mcc value: [0.69022744 0.66164278 0.68466145 0.65612626 0.66739922 0.67302425 0.67350891 0.66972224 0.68052658 0.67334868] mean value: 0.6730187805126882 key: test_accuracy value: [0.93617021 0.87234043 0.78723404 0.89361702 0.85106383 0.82978723 0.85106383 0.84782609 0.73913043 0.86956522] mean value: 0.8477798334875115 key: train_accuracy value: [0.86428571 0.85238095 0.86190476 0.85 0.8547619 0.85714286 0.85714286 0.85510689 0.85985748 0.85748219] mean value: 0.8570065603438525 key: test_fscore value: [0.95238095 0.90909091 0.84848485 0.92307692 0.89552239 0.88235294 0.89230769 0.88888889 0.8125 0.90909091] mean value: 0.8913696452557295 key: train_fscore value: [0.90289608 0.89419795 0.90136054 0.89303905 0.89608177 0.89761092 0.89830508 0.89678511 0.89948893 0.89795918] mean value: 0.8977724625814629 key: test_precision value: [0.9375 0.85714286 0.8 0.88235294 0.83333333 0.81081081 0.85294118 0.875 0.78787879 0.83333333] mean value: 0.8470293240146182 key: train_precision value: [0.85760518 0.85064935 0.85483871 0.84565916 0.85113269 0.8538961 0.84935897 0.84664537 0.85436893 0.85436893] mean value: 0.8518523398136467 key: test_recall value: [0.96774194 0.96774194 0.90322581 0.96774194 0.96774194 0.96774194 0.93548387 0.90322581 0.83870968 1. ] mean value: 0.9419354838709677 key: train_recall value: [0.95323741 0.94244604 0.95323741 0.94604317 0.94604317 0.94604317 0.95323741 0.95323741 0.94964029 0.94623656] mean value: 0.9489402026765684 key: test_roc_auc value: [0.92137097 0.82762097 0.7328629 0.85887097 0.79637097 0.76512097 0.81149194 0.81827957 0.68602151 0.8125 ] mean value: 0.8030510752688172 key: train_roc_auc value: [0.82168913 0.80925119 0.818168 0.8040075 0.81104975 0.81457088 0.81112575 0.80878654 0.81747749 0.81466758] mean value: 0.813079379384182 key: test_jcc value: [0.90909091 0.83333333 0.73684211 0.85714286 0.81081081 0.78947368 0.80555556 0.8 0.68421053 0.83333333] mean value: 0.8059793115056273 key: train_jcc value: [0.82298137 0.80864198 0.82043344 0.80674847 0.8117284 0.81424149 0.81538462 0.81288344 0.81733746 0.81481481] mean value: 0.8145195452770848 MCC on Blind test: 0.25 Accuracy on Blind test: 0.45 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.31461072 1.40823627 1.28151274 1.4231658 1.334095 1.30517697 1.41587329 1.28833318 1.49593544 1.34908724] mean value: 1.3616026639938354 key: score_time value: [0.01176286 0.01351857 0.0135088 0.01388788 0.01229548 0.01362157 0.01102948 0.01351404 0.01373792 0.01853848] mean value: 0.013541507720947265 key: test_mcc value: [1. 0.8084425 0.90662544 1. 0.95299692 0.76032282 0.90524194 0.90107527 0.74930844 0.80833333] mean value: 0.8792346661083966 key: train_mcc value: [0.9680267 0.95736701 0.94674008 0.9680267 0.96269263 0.9680267 0.9628398 0.96296053 0.95222181 0.99470992] mean value: 0.9643611879690016 key: test_accuracy value: [1. 0.91489362 0.95744681 1. 0.9787234 0.89361702 0.95744681 0.95652174 0.89130435 0.91304348] mean value: 0.9462997224791859 key: train_accuracy value: [0.98571429 0.98095238 0.97619048 0.98571429 0.98333333 0.98571429 0.98333333 0.98337292 0.97862233 0.9976247 ] mean value: 0.9840572333446442 key: test_fscore value: [1. 0.9375 0.96875 1. 0.98412698 0.92307692 0.96774194 0.96774194 0.92063492 0.93333333] mean value: 0.9602906032139903 key: train_fscore value: [0.98924731 0.98571429 0.98220641 0.98924731 0.98747764 0.98924731 0.98738739 0.98752228 0.98389982 0.99820467] mean value: 0.9880154423532531 key: test_precision value: [1. 0.90909091 0.93939394 1. 0.96875 0.88235294 0.96774194 0.96774194 0.90625 0.93333333] mean value: 0.9474654993962395 key: train_precision value: [0.98571429 0.9787234 0.97183099 0.98571429 0.98220641 0.98571429 0.98916968 0.97879859 0.97864769 1. ] mean value: 0.9836519601503051 key: test_recall value: [1. 0.96774194 1. 1. 1. 0.96774194 0.96774194 0.96774194 0.93548387 0.93333333] mean value: 0.9739784946236559 key: train_recall value: [0.99280576 0.99280576 0.99280576 0.99280576 0.99280576 0.99280576 0.98561151 0.99640288 0.98920863 0.99641577] mean value: 0.9924473324566154 key: test_roc_auc value: [1. 0.89012097 0.9375 1. 0.96875 0.85887097 0.95262097 0.95053763 0.86774194 0.90416667] mean value: 0.9330309139784947 key: train_roc_auc value: [0.98231837 0.97527612 0.96823386 0.98231837 0.97879724 0.98231837 0.98224238 0.97722242 0.9736253 0.99820789] mean value: 0.980056031046588 key: test_jcc value: [1. 0.88235294 0.93939394 1. 0.96875 0.85714286 0.9375 0.9375 0.85294118 0.875 ] mean value: 0.9250580914183856 key: train_jcc value: [0.9787234 0.97183099 0.96503497 0.9787234 0.97526502 0.9787234 0.97508897 0.97535211 0.96830986 0.99641577] mean value: 0.9763467891796095 MCC on Blind test: 0.13 Accuracy on Blind test: 0.31 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01342225 0.01069498 0.00975204 0.01033854 0.00994968 0.01040697 0.01035452 0.01077914 0.01066208 0.01090336] mean value: 0.010726356506347656 key: score_time value: [0.01061678 0.00818062 0.00800824 0.00842381 0.00850368 0.00858855 0.00848293 0.00844717 0.00845146 0.00849843] mean value: 0.008620166778564453 key: test_mcc value: [0.95299692 0.8566725 0.91188882 1. 0.86091836 0.8566725 0.87213027 0.95250095 0.90107527 0.80833333] mean value: 0.8973188916801316 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9787234 0.93617021 0.95744681 1. 0.93617021 0.93617021 0.93617021 0.97826087 0.95652174 0.91304348] mean value: 0.9528677150786309 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98412698 0.95238095 0.96666667 1. 0.95081967 0.95238095 0.94915254 0.98360656 0.96774194 0.93333333] mean value: 0.9640209596253838 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96875 0.9375 1. 1. 0.96666667 0.9375 1. 1. 0.96774194 0.93333333] mean value: 0.9711491935483871 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96774194 0.93548387 1. 0.93548387 0.96774194 0.90322581 0.96774194 0.96774194 0.93333333] mean value: 0.9578494623655914 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96875 0.92137097 0.96774194 1. 0.93649194 0.92137097 0.9516129 0.98387097 0.95053763 0.90416667] mean value: 0.9505913978494623 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96875 0.90909091 0.93548387 1. 0.90625 0.90909091 0.90322581 0.96774194 0.9375 0.875 ] mean value: 0.9312133431085043 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.06 Accuracy on Blind test: 0.2 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.10349464 0.09808111 0.10309243 0.10495615 0.10384583 0.10514021 0.10287976 0.10301304 0.10425162 0.10234761] mean value: 0.10311024188995362 key: score_time value: [0.01685739 0.01713133 0.01867747 0.01792812 0.01854682 0.01873803 0.01870346 0.01733375 0.01832008 0.01786637] mean value: 0.018010282516479494 key: test_mcc value: [0.90662544 0.8084425 0.81503725 0.90662544 0.86070252 0.76032282 0.81048387 0.85009261 0.8059304 0.90571105] mean value: 0.8429973908395795 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95744681 0.91489362 0.91489362 0.95744681 0.93617021 0.89361702 0.91489362 0.93478261 0.91304348 0.95652174] mean value: 0.9293709528214616 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96875 0.9375 0.93939394 0.96875 0.95384615 0.92307692 0.93548387 0.95238095 0.93939394 0.96774194] mean value: 0.9486317714543521 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.93939394 0.90909091 0.88571429 0.93939394 0.91176471 0.88235294 0.93548387 0.9375 0.88571429 0.9375 ] mean value: 0.9163908877333925 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96774194 1. 1. 1. 0.96774194 0.93548387 0.96774194 1. 1. ] mean value: 0.9838709677419355 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9375 0.89012097 0.875 0.9375 0.90625 0.85887097 0.90524194 0.9172043 0.86666667 0.9375 ] mean value: 0.9031854838709678 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.93939394 0.88235294 0.88571429 0.93939394 0.91176471 0.85714286 0.87878788 0.90909091 0.88571429 0.9375 ] mean value: 0.9026855742296919 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.2 Accuracy on Blind test: 0.36 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00836992 0.00826001 0.00835204 0.00824237 0.00821042 0.00798821 0.00828552 0.00832677 0.00851941 0.00838804] mean value: 0.008294272422790527 key: score_time value: [0.00871825 0.00869298 0.00867295 0.0086019 0.00861168 0.00866127 0.0086937 0.00873017 0.0088346 0.0087533 ] mean value: 0.008697080612182616 key: test_mcc value: [0.86091836 0.71206211 0.65309894 0.81952077 0.8084425 0.65994312 0.50614703 0.60602162 0.44695591 0.72379255] mean value: 0.6796902925193711 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.93617021 0.87234043 0.82978723 0.91489362 0.91489362 0.85106383 0.76595745 0.80434783 0.76086957 0.86956522] mean value: 0.8519888991674376 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.95081967 0.90909091 0.86206897 0.93333333 0.9375 0.89230769 0.81355932 0.84210526 0.82539683 0.89655172] mean value: 0.8862733707106872 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96666667 0.85714286 0.92592593 0.96551724 0.90909091 0.85294118 0.85714286 0.92307692 0.8125 0.92857143] mean value: 0.8998575985467466 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.93548387 0.96774194 0.80645161 0.90322581 0.96774194 0.93548387 0.77419355 0.77419355 0.83870968 0.86666667] mean value: 0.8769892473118279 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.93649194 0.82762097 0.84072581 0.9203629 0.89012097 0.81149194 0.76209677 0.82043011 0.71935484 0.87083333] mean value: 0.8399529569892473 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.90625 0.83333333 0.75757576 0.875 0.88235294 0.80555556 0.68571429 0.72727273 0.7027027 0.8125 ] mean value: 0.7988257303330832 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.09 Accuracy on Blind test: 0.38 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.29370928 1.2518785 1.24979663 1.24180865 1.26994014 1.25986075 1.2572484 1.2555747 1.23349094 1.23494911] mean value: 1.2548257112503052 key: score_time value: [0.09408879 0.09164119 0.08997083 0.09628367 0.09728193 0.1462996 0.09323502 0.08956718 0.08982635 0.08968997] mean value: 0.09778845310211182 key: test_mcc value: [1. 0.8566725 1. 1. 0.90662544 0.81503725 1. 0.95250095 0.95087679 0.85513419] mean value: 0.9336847119207848 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.93617021 1. 1. 0.95744681 0.91489362 1. 0.97826087 0.97826087 0.93478261] mean value: 0.9699814986123959 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.95238095 1. 1. 0.96875 0.93939394 1. 0.98360656 0.98412698 0.95081967] mean value: 0.9779078105410073 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.9375 1. 1. 0.93939394 0.88571429 1. 1. 0.96875 0.93548387] mean value: 0.9666842096075967 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96774194 1. 1. 1. 1. 1. 0.96774194 1. 0.96666667] mean value: 0.9902150537634409 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.92137097 1. 1. 0.9375 0.875 1. 0.98387097 0.96666667 0.92083333] mean value: 0.9605241935483871 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.90909091 1. 1. 0.93939394 0.88571429 1. 0.96774194 0.96875 0.90625 ] mean value: 0.9576941069683005 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.07 Accuracy on Blind test: 0.18 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( key: fit_time value: [1.75414562 0.86297989 0.94754958 0.91948628 0.92758203 1.00107074 0.93352938 0.92137861 0.88540673 0.90511346] mean value: 1.0058242321014403 key: score_time value: [0.23915219 0.2850039 0.25384307 0.23436403 0.24242306 0.2717557 0.25083756 0.22900653 0.23912811 0.27642059] mean value: 0.2521934747695923 key: test_mcc value: [1. 0.8084425 0.90662544 1. 0.90662544 0.81503725 1. 0.90107527 0.95087679 0.80651412] mean value: 0.9095196821072326 key: train_mcc value: [0.94694186 0.96278526 0.94694186 0.94694186 0.94694186 0.96278526 0.95221511 0.95793986 0.95769694 0.96282875] mean value: 0.9544018630875426 key: test_accuracy value: [1. 0.91489362 0.95744681 1. 0.95744681 0.91489362 1. 0.95652174 0.97826087 0.91304348] mean value: 0.9592506938020352 key: train_accuracy value: [0.97619048 0.98333333 0.97619048 0.97619048 0.97619048 0.98333333 0.97857143 0.98099762 0.98099762 0.98337292] mean value: 0.9795368171021377 key: test_fscore value: [1. 0.9375 0.96875 1. 0.96875 0.93939394 1. 0.96774194 0.98412698 0.93548387] mean value: 0.9701746729972536 key: train_fscore value: [0.9822695 0.98752228 0.9822695 0.9822695 0.9822695 0.98752228 0.98401421 0.9858156 0.98576512 0.98756661] mean value: 0.9847284121907804 key: test_precision value: [1. 0.90909091 0.93939394 1. 0.93939394 0.88571429 1. 0.96774194 0.96875 0.90625 ] mean value: 0.9516335009076945 key: train_precision value: [0.96853147 0.97879859 0.96853147 0.96853147 0.96853147 0.97879859 0.97192982 0.97202797 0.97535211 0.97887324] mean value: 0.9729906195972802 key: test_recall value: [1. 0.96774194 1. 1. 1. 1. 1. 0.96774194 1. 0.96666667] mean value: 0.9902150537634409 key: train_recall value: [0.99640288 0.99640288 0.99640288 0.99640288 0.99640288 0.99640288 0.99640288 1. 0.99640288 0.99641577] mean value: 0.9967638792192053 key: test_roc_auc value: [1. 0.89012097 0.9375 1. 0.9375 0.875 1. 0.95053763 0.96666667 0.88958333] mean value: 0.9446908602150538 key: train_roc_auc value: [0.9665113 0.97707468 0.9665113 0.9665113 0.9665113 0.97707468 0.97003242 0.97202797 0.97372591 0.97708112] mean value: 0.9713061984493545 key: test_jcc value: [1. 0.88235294 0.93939394 1. 0.93939394 0.88571429 1. 0.9375 0.96875 0.87878788] mean value: 0.9431892984466514 key: train_jcc value: [0.96515679 0.97535211 0.96515679 0.96515679 0.96515679 0.97535211 0.96853147 0.97202797 0.97192982 0.9754386 ] mean value: 0.9699259264664533 MCC on Blind test: 0.08 Accuracy on Blind test: 0.19 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01817513 0.0076189 0.00755739 0.00759125 0.0075407 0.00757432 0.00772119 0.0076313 0.0076437 0.00751853] mean value: 0.008657240867614746 key: score_time value: [0.0108695 0.00802517 0.00807548 0.00796461 0.00793648 0.00793743 0.00874805 0.00799894 0.00802493 0.00804806] mean value: 0.008362865447998047 key: test_mcc value: [0.76746995 0.61207663 0.31752781 0.71206211 0.76032282 0.6139232 0.66402366 0.59332241 0.38733878 0.70954337] mean value: 0.6137610732708011 key: train_mcc value: [0.62791789 0.64521328 0.66619129 0.63945586 0.63982246 0.63982246 0.6506538 0.65794031 0.65846852 0.63442864] mean value: 0.6459914516114823 key: test_accuracy value: [0.89361702 0.82978723 0.70212766 0.87234043 0.89361702 0.82978723 0.85106383 0.82608696 0.73913043 0.86956522] mean value: 0.8307123034227567 key: train_accuracy value: [0.83809524 0.8452381 0.85238095 0.84285714 0.84285714 0.84285714 0.84761905 0.85035629 0.85035629 0.84085511] mean value: 0.8453472457866757 key: test_fscore value: [0.91803279 0.875 0.78125 0.90909091 0.92307692 0.88235294 0.88888889 0.875 0.8125 0.90625 ] mean value: 0.8771442449118437 key: train_fscore value: [0.88316151 0.88773748 0.89007092 0.8862069 0.88581315 0.88581315 0.88965517 0.89156627 0.89081456 0.88468158] mean value: 0.8875520685563664 key: test_precision value: [0.93333333 0.84848485 0.75757576 0.85714286 0.88235294 0.81081081 0.875 0.84848485 0.78787879 0.85294118] mean value: 0.8454005361358302 key: train_precision value: [0.84539474 0.8538206 0.87762238 0.85099338 0.85333333 0.85333333 0.85430464 0.85478548 0.85953177 0.85099338] mean value: 0.8554113020989377 key: test_recall value: [0.90322581 0.90322581 0.80645161 0.96774194 0.96774194 0.96774194 0.90322581 0.90322581 0.83870968 0.96666667] mean value: 0.9127956989247312 key: train_recall value: [0.92446043 0.92446043 0.9028777 0.92446043 0.92086331 0.92086331 0.92805755 0.93165468 0.92446043 0.92114695] mean value: 0.9223305226786314 key: test_roc_auc value: [0.8891129 0.7953629 0.65322581 0.82762097 0.85887097 0.76512097 0.8266129 0.78494624 0.68602151 0.82708333] mean value: 0.7913978494623656 key: train_roc_auc value: [0.79673726 0.80730064 0.82819941 0.80377951 0.80550208 0.80550208 0.8090992 0.81198118 0.81537707 0.80212277] mean value: 0.8085601200017799 key: test_jcc value: [0.84848485 0.77777778 0.64102564 0.83333333 0.85714286 0.78947368 0.8 0.77777778 0.68421053 0.82857143] mean value: 0.783779787463998 key: train_jcc value: [0.79076923 0.79813665 0.80191693 0.79566563 0.79503106 0.79503106 0.80124224 0.80434783 0.803125 0.79320988] mean value: 0.7978475494770487 MCC on Blind test: 0.24 Accuracy on Blind test: 0.47 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.08977652 0.0446949 0.05097675 0.05265212 0.04966545 0.04864025 0.2227385 0.04262686 0.0463593 0.04706073] mean value: 0.06951913833618165 key: score_time value: [0.00969934 0.00960755 0.00962806 0.0097065 0.00962687 0.01001763 0.01041269 0.01037621 0.0100019 0.01042318] mean value: 0.009949994087219239 key: test_mcc value: [1. 0.8566725 1. 1. 0.90524194 0.86070252 1. 0.95250095 0.95087679 0.85513419] mean value: 0.9381128880260178 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.93617021 1. 1. 0.95744681 0.93617021 1. 0.97826087 0.97826087 0.93478261] mean value: 0.972109158186864 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.95238095 1. 1. 0.96774194 0.95384615 1. 0.98360656 0.98412698 0.95081967] mean value: 0.9792522255346158 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.9375 1. 1. 0.96774194 0.91176471 1. 1. 0.96875 0.93548387] mean value: 0.9721240512333966 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96774194 1. 1. 0.96774194 1. 1. 0.96774194 1. 0.96666667] mean value: 0.986989247311828 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.92137097 1. 1. 0.95262097 0.90625 1. 0.98387097 0.96666667 0.92083333] mean value: 0.9651612903225807 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.90909091 1. 1. 0.9375 0.91176471 1. 0.96774194 0.96875 0.90625 ] mean value: 0.9601097550457133 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.2 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.01633263 0.01602507 0.03087282 0.03793883 0.03829098 0.03755164 0.03849244 0.04578662 0.03847647 0.0386765 ] mean value: 0.033844399452209475 key: score_time value: [0.01047325 0.01068068 0.02036643 0.01072168 0.01989603 0.02082086 0.02522516 0.01081634 0.0206635 0.02184916] mean value: 0.017151308059692384 key: test_mcc value: [0.95436677 0.8566725 1. 1. 0.90662544 0.81503725 1. 0.9085301 0.90107527 0.75776742] mean value: 0.9100074758399945 key: train_mcc value: [0.94131391 0.95204958 0.93598399 0.94131391 0.94674008 0.95734993 0.93066133 0.9469026 0.9469923 0.95754545] mean value: 0.9456853089391832 key: test_accuracy value: [0.9787234 0.93617021 1. 1. 0.95744681 0.91489362 1. 0.95652174 0.95652174 0.89130435] mean value: 0.9591581868640148 key: train_accuracy value: [0.97380952 0.97857143 0.97142857 0.97380952 0.97619048 0.98095238 0.96904762 0.97624703 0.97624703 0.98099762] mean value: 0.9757301210270332 key: test_fscore value: [0.98360656 0.95238095 1. 1. 0.96875 0.93939394 1. 0.96666667 0.96774194 0.91803279] mean value: 0.9696572838187725 key: train_fscore value: [0.98039216 0.98395722 0.97864769 0.98039216 0.98220641 0.98566308 0.97690941 0.98214286 0.98220641 0.9858156 ] mean value: 0.9818332987468832 key: test_precision value: [1. 0.9375 1. 1. 0.93939394 0.88571429 1. 1. 0.96774194 0.90322581] mean value: 0.9633575967043709 key: train_precision value: [0.97173145 0.97526502 0.96830986 0.97173145 0.97183099 0.98214286 0.96491228 0.9751773 0.97183099 0.9754386 ] mean value: 0.972837078548064 key: test_recall value: [0.96774194 0.96774194 1. 1. 1. 1. 1. 0.93548387 0.96774194 0.93333333] mean value: 0.9772043010752688 key: train_recall value: [0.98920863 0.99280576 0.98920863 0.98920863 0.99280576 0.98920863 0.98920863 0.98920863 0.99280576 0.99641577] mean value: 0.991008483535752 key: test_roc_auc value: [0.98387097 0.92137097 1. 1. 0.9375 0.875 1. 0.96774194 0.95053763 0.87291667] mean value: 0.9508938172043011 key: train_roc_auc value: [0.9664353 0.97175499 0.96291418 0.9664353 0.96823386 0.97699868 0.95939305 0.97012879 0.96843085 0.97356 ] mean value: 0.9684285006076279 key: test_jcc value: [0.96774194 0.90909091 1. 1. 0.93939394 0.88571429 1. 0.93548387 0.9375 0.84848485] mean value: 0.9423409789135595 key: train_jcc value: [0.96153846 0.96842105 0.95818815 0.96153846 0.96503497 0.97173145 0.95486111 0.96491228 0.96503497 0.97202797] mean value: 0.9643288871692625 MCC on Blind test: 0.13 Accuracy on Blind test: 0.3 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.0186336 0.00766277 0.00761127 0.00744534 0.00747466 0.00750709 0.00777817 0.00826359 0.00808811 0.00809813] mean value: 0.00885627269744873 key: score_time value: [0.00869727 0.00830197 0.00809574 0.00786757 0.00828147 0.00783634 0.0086298 0.00836444 0.00861764 0.00868964] mean value: 0.008338189125061036 key: test_mcc value: [0.8566725 0.65994312 0.45918373 0.76032282 0.66337469 0.6139232 0.52620968 0.64852426 0.50537634 0.76764947] mean value: 0.6461179816200634 key: train_mcc value: [0.62766379 0.63945586 0.68424763 0.64471064 0.6504316 0.67304969 0.67293578 0.65214979 0.67466169 0.65101792] mean value: 0.6570324374013666 key: test_accuracy value: [0.93617021 0.85106383 0.76595745 0.89361702 0.85106383 0.82978723 0.78723404 0.84782609 0.7826087 0.89130435] mean value: 0.8436632747456059 key: train_accuracy value: [0.83809524 0.84285714 0.86190476 0.8452381 0.84761905 0.85714286 0.85714286 0.847981 0.85748219 0.847981 ] mean value: 0.8503444180522566 key: test_fscore value: [0.95238095 0.89230769 0.83076923 0.92307692 0.89552239 0.88235294 0.83870968 0.88888889 0.83870968 0.92307692] mean value: 0.8865795294575493 key: train_fscore value: [0.88356164 0.8862069 0.9 0.88850772 0.89003436 0.89655172 0.89726027 0.89041096 0.89726027 0.89003436] mean value: 0.8919828218593321 key: test_precision value: [0.9375 0.85294118 0.79411765 0.88235294 0.83333333 0.81081081 0.83870968 0.875 0.83870968 0.85714286] mean value: 0.8520618120831593 key: train_precision value: [0.84313725 0.85099338 0.86423841 0.84918033 0.85197368 0.86092715 0.85620915 0.8496732 0.85620915 0.85478548] mean value: 0.8537327189194519 key: test_recall value: [0.96774194 0.93548387 0.87096774 0.96774194 0.96774194 0.96774194 0.83870968 0.90322581 0.83870968 1. ] mean value: 0.9258064516129032 key: train_recall value: [0.92805755 0.92446043 0.93884892 0.93165468 0.93165468 0.9352518 0.94244604 0.9352518 0.94244604 0.92831541] mean value: 0.9338387354632423 key: test_roc_auc value: [0.92137097 0.81149194 0.71673387 0.85887097 0.79637097 0.76512097 0.76310484 0.81827957 0.75268817 0.84375 ] mean value: 0.8047782258064516 key: train_roc_auc value: [0.79501469 0.80377951 0.82505826 0.80385551 0.80737663 0.81973858 0.81629344 0.80678674 0.81737687 0.80922813] mean value: 0.8104508362630897 key: test_jcc value: [0.90909091 0.80555556 0.71052632 0.85714286 0.81081081 0.78947368 0.72222222 0.8 0.72222222 0.85714286] mean value: 0.7984187434187434 key: train_jcc value: [0.79141104 0.79566563 0.81818182 0.79938272 0.80185759 0.8125 0.8136646 0.80246914 0.8136646 0.80185759] mean value: 0.80506547104786 MCC on Blind test: 0.21 Accuracy on Blind test: 0.45 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00986862 0.01313043 0.0124433 0.01303792 0.01351666 0.01400781 0.01293039 0.01448417 0.01343918 0.01248717] mean value: 0.012934565544128418 key: score_time value: [0.00865817 0.00993657 0.0099678 0.01048064 0.01074982 0.0105195 0.01045227 0.01052117 0.01057601 0.01054454] mean value: 0.010240650177001953 key: test_mcc value: [1. 0.8566725 1. 0.95436677 0.90662544 0.81503725 0.90524194 0.9085301 0.7725558 0.85513419] mean value: 0.8974163989404769 key: train_mcc value: [0.93593571 0.9627116 0.92552437 0.92120646 0.92557595 0.85221677 0.93598399 0.94195411 0.93206488 0.89469123] mean value: 0.9227865066682192 key: test_accuracy value: [1. 0.93617021 1. 0.9787234 0.95744681 0.91489362 0.95744681 0.95652174 0.89130435 0.93478261] mean value: 0.9527289546716003 key: train_accuracy value: [0.97142857 0.98333333 0.96666667 0.96428571 0.96666667 0.93333333 0.97142857 0.97387173 0.96912114 0.95249406] mean value: 0.965262979300984 key: test_fscore value: [1. 0.95238095 1. 0.98360656 0.96875 0.93939394 0.96774194 0.96666667 0.91525424 0.95081967] mean value: 0.9644613960721762 key: train_fscore value: [0.97857143 0.98743268 0.97482014 0.97277677 0.97526502 0.95172414 0.97864769 0.98053097 0.97640653 0.96527778] mean value: 0.9741453144247227 key: test_precision value: [1. 0.9375 1. 1. 0.93939394 0.88571429 0.96774194 1. 0.96428571 0.93548387] mean value: 0.9630119745845552 key: train_precision value: [0.97163121 0.98566308 0.97482014 0.98168498 0.95833333 0.91390728 0.96830986 0.96515679 0.98534799 0.93602694] mean value: 0.9640881606737391 key: test_recall value: [1. 0.96774194 1. 0.96774194 1. 1. 0.96774194 0.93548387 0.87096774 0.96666667] mean value: 0.9676344086021506 key: train_recall value: [0.98561151 0.98920863 0.97482014 0.96402878 0.99280576 0.99280576 0.98920863 0.99640288 0.9676259 0.99641577] mean value: 0.984893375622083 key: test_roc_auc value: [1. 0.92137097 1. 0.98387097 0.9375 0.875 0.95262097 0.96774194 0.90215054 0.92083333] mean value: 0.946108870967742 key: train_roc_auc value: [0.96463674 0.98051981 0.96276218 0.96440875 0.95414936 0.90485358 0.96291418 0.9632364 0.96982694 0.93130648] mean value: 0.9558614420708662 key: test_jcc value: [1. 0.90909091 1. 0.96774194 0.93939394 0.88571429 0.9375 0.93548387 0.84375 0.90625 ] mean value: 0.9324924940650747 key: train_jcc value: [0.95804196 0.9751773 0.95087719 0.94699647 0.95172414 0.90789474 0.95818815 0.96180556 0.95390071 0.93288591] mean value: 0.9497492121318976 MCC on Blind test: 0.09 Accuracy on Blind test: 0.27 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.0119803 0.012357 0.01353669 0.01262236 0.01239181 0.01188588 0.01398087 0.01292968 0.01270461 0.01215243] mean value: 0.01265416145324707 key: score_time value: [0.01043653 0.01049995 0.01050258 0.0104773 0.01051712 0.01048827 0.0106318 0.01071954 0.01067996 0.01075029] mean value: 0.010570335388183593 key: test_mcc value: [1. 0.8084425 0.87213027 0.95299692 0.90662544 0.78063446 0.95299692 0.85009261 0.81245565 0.76471368] mean value: 0.8701088462869901 key: train_mcc value: [0.93057824 0.96269263 0.86379539 0.93066133 0.86786568 0.85610492 0.94674008 0.88991881 0.94166847 0.91286344] mean value: 0.9102888984394315 key: test_accuracy value: [1. 0.91489362 0.93617021 0.9787234 0.95744681 0.89361702 0.9787234 0.93478261 0.91304348 0.89130435] mean value: 0.9398704902867715 key: train_accuracy value: [0.96904762 0.98333333 0.93571429 0.96904762 0.94047619 0.93095238 0.97619048 0.95011876 0.97387173 0.95961995] mean value: 0.9588372356068318 key: test_fscore value: [1. 0.9375 0.94915254 0.98412698 0.96875 0.91525424 0.98412698 0.95238095 0.93333333 0.91525424] mean value: 0.9539879270917406 key: train_fscore value: [0.97682709 0.98747764 0.94990724 0.97690941 0.95667244 0.94579439 0.98220641 0.96347826 0.98025135 0.96892139] mean value: 0.9688445621247324 key: test_precision value: [1. 0.90909091 1. 0.96875 0.93939394 0.96428571 0.96875 0.9375 0.96551724 0.93103448] mean value: 0.9584322286908494 key: train_precision value: [0.96819788 0.98220641 0.98084291 0.96491228 0.92307692 0.9844358 0.97183099 0.93265993 0.97849462 0.98880597] mean value: 0.9675463711254643 key: test_recall value: [1. 0.96774194 0.90322581 1. 1. 0.87096774 1. 0.96774194 0.90322581 0.9 ] mean value: 0.9512903225806452 key: train_recall value: [0.98561151 0.99280576 0.92086331 0.98920863 0.99280576 0.91007194 0.99280576 0.99640288 0.98201439 0.94982079] mean value: 0.971241071658802 key: test_roc_auc value: [1. 0.89012097 0.9516129 0.96875 0.9375 0.90423387 0.96875 0.9172043 0.91827957 0.8875 ] mean value: 0.9343951612903226 key: train_roc_auc value: [0.96111561 0.97879724 0.94282602 0.95939305 0.91541696 0.94095146 0.96823386 0.92827137 0.97002817 0.96434701] mean value: 0.9529380774427173 key: test_jcc value: [1. 0.88235294 0.90322581 0.96875 0.93939394 0.84375 0.96875 0.90909091 0.875 0.84375 ] mean value: 0.9134063596112932 key: train_jcc value: [0.95470383 0.97526502 0.90459364 0.95486111 0.91694352 0.89716312 0.96503497 0.9295302 0.96126761 0.93971631] mean value: 0.9399079327337388 MCC on Blind test: 0.07 Accuracy on Blind test: 0.21 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.1008575 0.08776975 0.08655286 0.0874176 0.08782935 0.08918238 0.09195852 0.09141636 0.09183121 0.08885765] mean value: 0.09036731719970703 key: score_time value: [0.01442814 0.0153048 0.01412559 0.01522112 0.01434422 0.0145371 0.01523519 0.01551008 0.0142715 0.01540041] mean value: 0.014837813377380372 key: test_mcc value: [0.90524194 0.8566725 0.95436677 1. 0.90662544 0.81503725 0.95436677 0.95250095 0.95087679 0.75806977] mean value: 0.9053758183184529 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.95744681 0.93617021 0.9787234 1. 0.95744681 0.91489362 0.9787234 0.97826087 0.97826087 0.89130435] mean value: 0.9571230342275671 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96774194 0.95238095 0.98360656 1. 0.96875 0.93939394 0.98360656 0.98360656 0.98412698 0.92063492] mean value: 0.9683848404151815 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96774194 0.9375 1. 1. 0.93939394 0.88571429 1. 1. 0.96875 0.87878788] mean value: 0.9577888039379975 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96774194 0.96774194 0.96774194 1. 1. 1. 0.96774194 0.96774194 1. 0.96666667] mean value: 0.9805376344086022 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.95262097 0.92137097 0.98387097 1. 0.9375 0.875 0.98387097 0.98387097 0.96666667 0.85833333] mean value: 0.9463104838709677 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.9375 0.90909091 0.96774194 1. 0.93939394 0.88571429 0.96774194 0.96774194 0.96875 0.85294118] mean value: 0.9396616117121336 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.05 Accuracy on Blind test: 0.19 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.03681731 0.03343534 0.04631495 0.04816437 0.05419993 0.04136348 0.03020048 0.03114557 0.05245137 0.04301977] mean value: 0.04171125888824463 key: score_time value: [0.02169442 0.01837158 0.02844691 0.01603293 0.03581977 0.02635193 0.0178473 0.01740122 0.02229071 0.01603532] mean value: 0.02202920913696289 key: test_mcc value: [0.95299692 0.8566725 1. 1. 0.8566725 0.81503725 0.91188882 0.95250095 0.95087679 0.85927505] mean value: 0.9155920774240871 key: train_mcc value: [0.97879832 1. 0.99468526 0.98945277 0.98408467 0.99468526 0.98945277 0.99472781 0.98940987 0.98946562] mean value: 0.9904762341887853 key: test_accuracy value: [0.9787234 0.93617021 1. 1. 0.93617021 0.91489362 0.95744681 0.97826087 0.97826087 0.93478261] mean value: 0.9614708603145236 key: train_accuracy value: [0.99047619 1. 0.99761905 0.9952381 0.99285714 0.99761905 0.9952381 0.9976247 0.99524941 0.99524941] mean value: 0.995717113448705 key: test_fscore value: [0.98412698 0.95238095 1. 1. 0.95238095 0.93939394 0.96666667 0.98360656 0.98412698 0.94915254] mean value: 0.9711835578826409 key: train_fscore value: [0.99285714 1. 0.99820467 0.99638989 0.99463327 0.99820467 0.99638989 0.9981982 0.99640288 0.99640288] mean value: 0.9967683489274677 key: test_precision value: [0.96875 0.9375 1. 1. 0.9375 0.88571429 1. 1. 0.96875 0.96551724] mean value: 0.9663731527093596 key: train_precision value: [0.9858156 1. 0.99641577 1. 0.98932384 0.99641577 1. 1. 0.99640288 1. ] mean value: 0.9964373865169729 key: test_recall value: [1. 0.96774194 1. 1. 0.96774194 1. 0.93548387 0.96774194 1. 0.93333333] mean value: 0.9772043010752688 key: train_recall value: [1. 1. 1. 0.99280576 1. 1. 0.99280576 0.99640288 0.99640288 0.99283154] mean value: 0.9971248807405688 key: test_roc_auc value: [0.96875 0.92137097 1. 1. 0.92137097 0.875 0.96774194 0.98387097 0.96666667 0.93541667] mean value: 0.954018817204301 key: train_roc_auc value: [0.98591549 1. 0.99647887 0.99640288 0.98943662 0.99647887 0.99640288 0.99820144 0.99470494 0.99641577] mean value: 0.995043775936127 key: test_jcc value: [0.96875 0.90909091 1. 1. 0.90909091 0.88571429 0.93548387 0.96774194 0.96875 0.90322581] mean value: 0.9447847716799329 key: train_jcc value: [0.9858156 1. 0.99641577 0.99280576 0.98932384 0.99641577 0.99280576 0.99640288 0.99283154 0.99283154] mean value: 0.9935648458398372 MCC on Blind test: 0.07 Accuracy on Blind test: 0.19 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.06595278 0.07601976 0.07935166 0.12056971 0.07869649 0.06834197 0.13297486 0.15896058 0.14390469 0.13192463] mean value: 0.10566971302032471 key: score_time value: [0.01232171 0.01868176 0.01198316 0.01882172 0.01206756 0.01199055 0.01884794 0.02548599 0.02592111 0.0252378 ] mean value: 0.018135929107666017 key: test_mcc value: [0.90662544 0.60908698 0.4512753 0.65994312 0.71206211 0.6139232 0.66402366 0.59332241 0.43161973 0.76764947] mean value: 0.6409531430663058 key: train_mcc value: [0.80273059 0.7991351 0.79087061 0.79295441 0.78611575 0.79743374 0.78683895 0.80017613 0.80374289 0.79643548] mean value: 0.7956433649163105 key: test_accuracy value: [0.95744681 0.82978723 0.76595745 0.85106383 0.87234043 0.82978723 0.85106383 0.82608696 0.76086957 0.89130435] mean value: 0.8435707678075856 key: train_accuracy value: [0.91190476 0.90952381 0.90714286 0.90714286 0.9047619 0.90952381 0.9047619 0.90973872 0.91211401 0.90973872] mean value: 0.9086353353693021 key: test_fscore value: [0.96875 0.87878788 0.8358209 0.89230769 0.90909091 0.88235294 0.88888889 0.875 0.83076923 0.92307692] mean value: 0.8884845359620381 key: train_fscore value: [0.93653516 0.93537415 0.93287435 0.93356048 0.93150685 0.93493151 0.93174061 0.93537415 0.93653516 0.9347079 ] mean value: 0.9343140331061971 key: test_precision value: [0.93939394 0.82857143 0.77777778 0.85294118 0.85714286 0.81081081 0.875 0.84848485 0.79411765 0.85714286] mean value: 0.8441383342853931 key: train_precision value: [0.89508197 0.88709677 0.89438944 0.88673139 0.88888889 0.89215686 0.88636364 0.88709677 0.89508197 0.89768977] mean value: 0.8910577470317502 key: test_recall value: [1. 0.93548387 0.90322581 0.93548387 0.96774194 0.96774194 0.90322581 0.90322581 0.87096774 1. ] mean value: 0.9387096774193548 key: train_recall value: [0.98201439 0.98920863 0.97482014 0.98561151 0.97841727 0.98201439 0.98201439 0.98920863 0.98201439 0.97491039] mean value: 0.9820234135272428 key: test_roc_auc value: [0.9375 0.78024194 0.7016129 0.81149194 0.82762097 0.76512097 0.8266129 0.78494624 0.70215054 0.84375 ] mean value: 0.7981048387096774 key: train_roc_auc value: [0.87833114 0.87136488 0.87473402 0.86956632 0.86949032 0.87481001 0.86776776 0.87222669 0.87911908 0.87830027] mean value: 0.8735710488300057 key: test_jcc value: [0.93939394 0.78378378 0.71794872 0.80555556 0.83333333 0.78947368 0.8 0.77777778 0.71052632 0.85714286] mean value: 0.8014935964935965 key: train_jcc value: [0.88064516 0.87859425 0.87419355 0.87539936 0.87179487 0.8778135 0.87220447 0.87859425 0.88064516 0.87741935] mean value: 0.8767303934692845 MCC on Blind test: 0.21 Accuracy on Blind test: 0.42 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.2230103 0.21394682 0.20187664 0.20994234 0.20898438 0.21629405 0.21086693 0.20910215 0.21160555 0.20683503] mean value: 0.21124641895294188 key: score_time value: [0.00933719 0.00840378 0.00872827 0.00917697 0.00930619 0.00924182 0.00842547 0.00914001 0.00950432 0.00904679] mean value: 0.009031081199645996 key: test_mcc value: [1. 0.8566725 1. 1. 0.95299692 0.81503725 1. 0.95250095 0.95087679 0.80833333] mean value: 0.9336417737001077 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.93617021 1. 1. 0.9787234 0.91489362 1. 0.97826087 0.97826087 0.91304348] mean value: 0.9699352451433858 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.95238095 1. 1. 0.98412698 0.93939394 1. 0.98360656 0.98412698 0.93333333] mean value: 0.9776968750739242 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.9375 1. 1. 0.96875 0.88571429 1. 1. 0.96875 0.93333333] mean value: 0.9694047619047619 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96774194 1. 1. 1. 1. 1. 0.96774194 1. 0.93333333] mean value: 0.9868817204301076 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.92137097 1. 1. 0.96875 0.875 1. 0.98387097 0.96666667 0.90416667] mean value: 0.9619825268817205 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.90909091 1. 1. 0.96875 0.88571429 1. 0.96774194 0.96875 0.875 ] mean value: 0.9575047130289066 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.19 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.0117166 0.01313019 0.01318526 0.01325989 0.01305079 0.01312709 0.01312232 0.01324248 0.01329851 0.0137887 ] mean value: 0.01309218406677246 key: score_time value: [0.0111506 0.01089978 0.01084971 0.0108676 0.01087546 0.01084447 0.01105189 0.01162434 0.01162648 0.01165462] mean value: 0.011144495010375977 key: test_mcc value: [0.46502704 0.68913865 0.66402366 0.71206211 0.6139232 0.67402153 0.62096774 0.74844698 0.44695591 0.53674504] mean value: 0.6171311872005444 key: train_mcc value: [0.6778431 0.7128472 0.85474068 0.79307454 0.73273261 0.88954988 0.79770673 0.82923345 0.77993671 0.88249782] mean value: 0.7950162701330918 key: test_accuracy value: [0.70212766 0.85106383 0.85106383 0.87234043 0.82978723 0.85106383 0.82978723 0.89130435 0.76086957 0.7826087 ] mean value: 0.8222016651248844 key: train_accuracy value: [0.82142857 0.8452381 0.93333333 0.9047619 0.88095238 0.95 0.90714286 0.9239905 0.90261283 0.94536817] mean value: 0.9014828639294198 key: test_fscore value: [0.73076923 0.88135593 0.88888889 0.90909091 0.88235294 0.8852459 0.87096774 0.92307692 0.82539683 0.82758621] mean value: 0.8624731501074018 key: train_fscore value: [0.84662577 0.86973948 0.94871795 0.92647059 0.91582492 0.96188748 0.92844037 0.94425087 0.92794376 0.95779817] mean value: 0.9227699340095629 key: test_precision value: [0.9047619 0.92857143 0.875 0.85714286 0.81081081 0.9 0.87096774 0.88235294 0.8125 0.85714286] mean value: 0.8699250541541813 key: train_precision value: [0.98104265 0.98190045 0.96641791 0.94736842 0.86075949 0.97069597 0.94756554 0.91554054 0.90721649 0.98120301] mean value: 0.9459710488360232 key: test_recall value: [0.61290323 0.83870968 0.90322581 0.96774194 0.96774194 0.87096774 0.87096774 0.96774194 0.83870968 0.8 ] mean value: 0.8638709677419355 key: train_recall value: [0.74460432 0.78057554 0.93165468 0.90647482 0.97841727 0.95323741 0.91007194 0.97482014 0.94964029 0.93548387] mean value: 0.906498027384544 key: test_roc_auc value: [0.74395161 0.85685484 0.8266129 0.82762097 0.76512097 0.84173387 0.81048387 0.85053763 0.71935484 0.775 ] mean value: 0.8017271505376344 key: train_roc_auc value: [0.85821765 0.87620326 0.9341372 0.90394164 0.83427906 0.94844969 0.9057402 0.89999748 0.88041455 0.9501363 ] mean value: 0.8991517025527074 key: test_jcc value: [0.57575758 0.78787879 0.8 0.83333333 0.78947368 0.79411765 0.77142857 0.85714286 0.7027027 0.70588235] mean value: 0.7617717512454355 key: train_jcc value: [0.73404255 0.76950355 0.90243902 0.8630137 0.8447205 0.92657343 0.86643836 0.89438944 0.86557377 0.91901408] mean value: 0.8585708395886121 MCC on Blind test: 0.13 Accuracy on Blind test: 0.63 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02074528 0.02051353 0.01858282 0.02953506 0.02967763 0.0314672 0.02946687 0.02941847 0.02958179 0.02945447] mean value: 0.026844310760498046 key: score_time value: [0.02140307 0.01061296 0.01083541 0.02066278 0.0109098 0.01821399 0.02039957 0.01891303 0.02117467 0.01968718] mean value: 0.017281246185302735 key: test_mcc value: [0.95299692 0.8084425 0.8566725 0.95299692 0.90662544 0.76032282 0.90662544 0.80215054 0.75776742 0.85513419] mean value: 0.8559734697377736 key: train_mcc value: [0.92003671 0.92030205 0.87684521 0.89326029 0.93085643 0.90414739 0.88770942 0.9151442 0.88322214 0.90932054] mean value: 0.9040844381960059 key: test_accuracy value: [0.9787234 0.91489362 0.93617021 0.9787234 0.95744681 0.89361702 0.95744681 0.91304348 0.89130435 0.93478261] mean value: 0.9356151711378353 key: train_accuracy value: [0.96428571 0.96428571 0.9452381 0.95238095 0.96904762 0.95714286 0.95 0.96199525 0.94774347 0.95961995] mean value: 0.9571739622214681 key: test_fscore value: [0.98412698 0.9375 0.95238095 0.98412698 0.96875 0.92307692 0.96875 0.93548387 0.91803279 0.95081967] mean value: 0.9523048173695978 key: train_fscore value: [0.97345133 0.97354497 0.95943563 0.96478873 0.97699115 0.96830986 0.96296296 0.97173145 0.96140351 0.97001764] mean value: 0.9682637226255115 key: test_precision value: [0.96875 0.90909091 0.9375 0.96875 0.93939394 0.88235294 0.93939394 0.93548387 0.93333333 0.93548387] mean value: 0.9349532804324076 key: train_precision value: [0.95818815 0.9550173 0.94117647 0.94482759 0.96167247 0.94827586 0.94463668 0.95486111 0.93835616 0.95486111] mean value: 0.9501872911886335 key: test_recall value: [1. 0.96774194 0.96774194 1. 1. 0.96774194 1. 0.93548387 0.90322581 0.96666667] mean value: 0.9708602150537634 key: train_recall value: [0.98920863 0.99280576 0.97841727 0.98561151 0.99280576 0.98920863 0.98201439 0.98920863 0.98561151 0.98566308] mean value: 0.9870555168768211 key: test_roc_auc value: [0.96875 0.89012097 0.92137097 0.96875 0.9375 0.85887097 0.9375 0.90107527 0.88494624 0.92083333] mean value: 0.9189717741935484 key: train_roc_auc value: [0.9523508 0.95062823 0.92934948 0.93646773 0.95767048 0.94178742 0.93466917 0.94914977 0.92986869 0.94705689] mean value: 0.9428998652048836 key: test_jcc value: [0.96875 0.88235294 0.90909091 0.96875 0.93939394 0.85714286 0.93939394 0.87878788 0.84848485 0.90625 ] mean value: 0.9098397313470843 key: train_jcc value: [0.94827586 0.94845361 0.9220339 0.93197279 0.9550173 0.93856655 0.92857143 0.94501718 0.92567568 0.94178082] mean value: 0.9385365119971703 MCC on Blind test: 0.18 Accuracy on Blind test: 0.4 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:122: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:125: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.20578527 0.09308887 0.1543951 0.2597549 0.22493172 0.19128704 0.15882158 0.10618186 0.18721414 0.18994951] mean value: 0.17714099884033202 key: score_time value: [0.01115108 0.01121378 0.0221827 0.02100086 0.02120638 0.02156854 0.01102662 0.0211966 0.01679158 0.01401901] mean value: 0.01713571548461914 key: test_mcc value: [1. 0.8566725 1. 0.95299692 0.90662544 0.81503725 0.95299692 0.9085301 0.90107527 0.85513419] mean value: 0.914906858369952 key: train_mcc value: [0.92522791 0.94131391 0.91988445 0.92534566 0.93598399 0.94131391 0.93066133 0.94171645 0.93099139 0.94680199] mean value: 0.9339241011569926 key: test_accuracy value: [1. 0.93617021 1. 0.9787234 0.95744681 0.91489362 0.9787234 0.95652174 0.95652174 0.93478261] mean value: 0.9613783533765032 key: train_accuracy value: [0.96666667 0.97380952 0.96428571 0.96666667 0.97142857 0.97380952 0.96904762 0.97387173 0.96912114 0.97624703] mean value: 0.970495419070241 key: test_fscore value: [1. 0.95238095 1. 0.98412698 0.96875 0.93939394 0.98412698 0.96666667 0.96774194 0.95081967] mean value: 0.9714007134310545 key: train_fscore value: [0.97508897 0.98039216 0.97335702 0.9751773 0.97864769 0.98039216 0.97690941 0.98046181 0.97690941 0.9822695 ] mean value: 0.9779605432457805 key: test_precision value: [1. 0.9375 1. 0.96875 0.93939394 0.88571429 0.96875 1. 0.96774194 0.93548387] mean value: 0.9603334031559838 key: train_precision value: [0.96478873 0.97173145 0.96140351 0.96153846 0.96830986 0.97173145 0.96491228 0.96842105 0.96491228 0.97192982] mean value: 0.9669678897982681 key: test_recall value: [1. 0.96774194 1. 1. 1. 1. 1. 0.93548387 0.96774194 0.96666667] mean value: 0.983763440860215 key: train_recall value: [0.98561151 0.98920863 0.98561151 0.98920863 0.98920863 0.98920863 0.98920863 0.99280576 0.98920863 0.99283154] mean value: 0.9892112116758206 key: test_roc_auc value: [1. 0.92137097 1. 0.96875 0.9375 0.875 0.96875 0.96774194 0.95053763 0.92083333] mean value: 0.9510483870967742 key: train_roc_auc value: [0.95759449 0.9664353 0.95407336 0.95587192 0.96291418 0.9664353 0.95939305 0.96493435 0.95963928 0.96824676] mean value: 0.9615537984903284 key: test_jcc value: [1. 0.90909091 1. 0.96875 0.93939394 0.88571429 0.96875 0.93548387 0.9375 0.90625 ] mean value: 0.9450933005166876 key: train_jcc value: [0.95138889 0.96153846 0.94809689 0.95155709 0.95818815 0.96153846 0.95486111 0.96167247 0.95486111 0.96515679] mean value: 0.9568859435029576 MCC on Blind test: 0.14 Accuracy on Blind test: 0.33 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.02589321 0.0243125 0.02480197 0.02414203 0.02598643 0.0267787 0.02300882 0.02308178 0.02694511 0.02658701] mean value: 0.025153756141662598 key: score_time value: [0.01105022 0.01108479 0.02707553 0.01083922 0.01093078 0.01091671 0.01093459 0.01086307 0.01094246 0.01091409] mean value: 0.012555146217346191 key: test_mcc value: [1. 0.7130241 0.77784447 0.83914639 0.87096774 0.87096774 0.74193548 0.84266484 0.67314268 0.8688172 ] mean value: 0.8198510652102912 key: train_mcc value: [0.87415162 0.85611511 0.87052613 0.84894283 0.84894283 0.84892086 0.85256763 0.84537297 0.86364692 0.85997009] mean value: 0.8569156981998511 key: test_accuracy value: [1. 0.85483871 0.88709677 0.91935484 0.93548387 0.93548387 0.87096774 0.91935484 0.83606557 0.93442623] mean value: 0.9093072448439978 key: train_accuracy value: [0.93705036 0.92805755 0.9352518 0.92446043 0.92446043 0.92446043 0.92625899 0.92266187 0.93177738 0.92998205] mean value: 0.9284421295997314 key: test_fscore value: [1. 0.86153846 0.89230769 0.92063492 0.93548387 0.93548387 0.87096774 0.91525424 0.84375 0.93333333] mean value: 0.9108754128973511 key: train_fscore value: [0.93670886 0.92805755 0.93548387 0.92473118 0.92473118 0.92446043 0.92665474 0.92307692 0.93214286 0.92998205] mean value: 0.9286029650436789 key: test_precision value: [1. 0.82352941 0.85294118 0.90625 0.93548387 0.93548387 0.87096774 0.96428571 0.81818182 0.93333333] mean value: 0.9040456937907128 key: train_precision value: [0.94181818 0.92805755 0.93214286 0.92142857 0.92142857 0.92446043 0.92170819 0.91814947 0.92553191 0.93165468] mean value: 0.9266380409827853 key: test_recall value: [1. 0.90322581 0.93548387 0.93548387 0.93548387 0.93548387 0.87096774 0.87096774 0.87096774 0.93333333] mean value: 0.9191397849462365 key: train_recall value: [0.93165468 0.92805755 0.93884892 0.92805755 0.92805755 0.92446043 0.93165468 0.92805755 0.93884892 0.92831541] mean value: 0.9306013253912999 key: test_roc_auc value: [1. 0.85483871 0.88709677 0.91935484 0.93548387 0.93548387 0.87096774 0.91935484 0.83548387 0.9344086 ] mean value: 0.909247311827957 key: train_roc_auc value: [0.93705036 0.92805755 0.9352518 0.92446043 0.92446043 0.92446043 0.92625899 0.92266187 0.93179005 0.92998504] mean value: 0.9284436966555788 key: test_jcc value: [1. 0.75675676 0.80555556 0.85294118 0.87878788 0.87878788 0.77142857 0.84375 0.72972973 0.875 ] mean value: 0.839273754751696 key: train_jcc value: [0.88095238 0.86577181 0.87878788 0.86 0.86 0.85953177 0.86333333 0.85714286 0.8729097 0.86912752] mean value: 0.8667557250647416 MCC on Blind test: 0.21 Accuracy on Blind test: 0.5 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.84429264 0.72940993 0.72171068 0.85939646 0.69464445 0.72773337 0.77860117 0.70124364 0.78092885 0.7428112 ] mean value: 0.7580772399902344 key: score_time value: [0.01205468 0.01223755 0.01254439 0.01247644 0.02100563 0.01274776 0.01243854 0.01249003 0.01463079 0.01232004] mean value: 0.013494586944580078 key: test_mcc value: [0.96824584 0.93548387 0.96824584 0.90748521 0.90369611 0.93548387 1. 0.87278605 0.90215054 0.8688172 ] mean value: 0.9262394532240339 key: train_mcc value: [0.94966486 0.96412858 0.94604929 0.96763216 0.96405373 0.96405373 0.94604929 0.97482645 0.96774069 0.96783888] mean value: 0.9612037646576601 key: test_accuracy value: [0.98387097 0.96774194 0.98387097 0.9516129 0.9516129 0.96774194 1. 0.93548387 0.95081967 0.93442623] mean value: 0.9627181385510312 key: train_accuracy value: [0.97482014 0.98201439 0.97302158 0.98381295 0.98201439 0.98201439 0.97302158 0.98741007 0.98384201 0.98384201] mean value: 0.9805813517946863 key: test_fscore value: [0.98412698 0.96774194 0.98360656 0.95384615 0.95081967 0.96774194 1. 0.93333333 0.95081967 0.93333333] mean value: 0.9625369577246891 key: train_fscore value: [0.97491039 0.98214286 0.97307002 0.98384201 0.98207885 0.98207885 0.97307002 0.98743268 0.98389982 0.98401421] mean value: 0.9806539709925397 key: test_precision value: [0.96875 0.96774194 1. 0.91176471 0.96666667 0.96774194 1. 0.96551724 0.96666667 0.93333333] mean value: 0.9648182484896072 key: train_precision value: [0.97142857 0.9751773 0.97132616 0.98207885 0.97857143 0.97857143 0.97132616 0.98566308 0.97864769 0.97535211] mean value: 0.9768142798277739 key: test_recall value: [1. 0.96774194 0.96774194 1. 0.93548387 0.96774194 1. 0.90322581 0.93548387 0.93333333] mean value: 0.9610752688172043 key: train_recall value: [0.97841727 0.98920863 0.97482014 0.98561151 0.98561151 0.98561151 0.97482014 0.98920863 0.98920863 0.99283154] mean value: 0.9845349526830148 key: test_roc_auc value: [0.98387097 0.96774194 0.98387097 0.9516129 0.9516129 0.96774194 1. 0.93548387 0.95107527 0.9344086 ] mean value: 0.962741935483871 key: train_roc_auc value: [0.97482014 0.98201439 0.97302158 0.98381295 0.98201439 0.98201439 0.97302158 0.98741007 0.98385163 0.98382584] mean value: 0.9805806967329362 key: test_jcc value: [0.96875 0.9375 0.96774194 0.91176471 0.90625 0.9375 1. 0.875 0.90625 0.875 ] mean value: 0.9285756641366224 key: train_jcc value: [0.95104895 0.96491228 0.94755245 0.96819788 0.96478873 0.96478873 0.94755245 0.9751773 0.96830986 0.96853147] mean value: 0.9620860104153928 MCC on Blind test: 0.14 Accuracy on Blind test: 0.35 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01086211 0.01025057 0.00859928 0.00814486 0.00849771 0.00856209 0.0083437 0.00834632 0.00836563 0.0083468 ] mean value: 0.00883190631866455 key: score_time value: [0.01082826 0.00907207 0.00906467 0.00892687 0.00863767 0.00889754 0.00867772 0.00834537 0.00863099 0.00864482] mean value: 0.008972597122192384 key: test_mcc value: [0.83914639 0.64820372 0.71004695 0.81325006 0.80645161 0.74348441 0.61418277 0.87278605 0.60645161 0.70505961] mean value: 0.7359063194782269 key: train_mcc value: [0.75529076 0.7627676 0.76266888 0.74820144 0.73741484 0.74837576 0.74460913 0.73025835 0.76301539 0.75249226] mean value: 0.7505094421634964 key: test_accuracy value: [0.91935484 0.82258065 0.85483871 0.90322581 0.90322581 0.87096774 0.80645161 0.93548387 0.80327869 0.85245902] mean value: 0.8671866737176097 key: train_accuracy value: [0.87410072 0.88129496 0.88129496 0.87410072 0.86870504 0.87410072 0.87230216 0.86510791 0.88150808 0.87612208] mean value: 0.8748637355824497 key: test_fscore value: [0.92063492 0.83076923 0.85245902 0.90909091 0.90322581 0.875 0.8 0.93333333 0.80645161 0.84745763] mean value: 0.8678422456695318 key: train_fscore value: [0.88215488 0.88 0.88214286 0.87410072 0.86894075 0.87272727 0.87253142 0.86437613 0.88129496 0.87477314] mean value: 0.8753042137774966 key: test_precision value: [0.90625 0.79411765 0.86666667 0.85714286 0.90322581 0.84848485 0.82758621 0.96551724 0.80645161 0.86206897] mean value: 0.8637511852501137 key: train_precision value: [0.82911392 0.88970588 0.87588652 0.87410072 0.86738351 0.88235294 0.87096774 0.86909091 0.88129496 0.88602941] mean value: 0.8725926531191879 key: test_recall value: [0.93548387 0.87096774 0.83870968 0.96774194 0.90322581 0.90322581 0.77419355 0.90322581 0.80645161 0.83333333] mean value: 0.8736559139784946 key: train_recall value: [0.94244604 0.8705036 0.88848921 0.87410072 0.8705036 0.86330935 0.87410072 0.85971223 0.88129496 0.86379928] mean value: 0.8788259714808798 key: test_roc_auc value: [0.91935484 0.82258065 0.85483871 0.90322581 0.90322581 0.87096774 0.80645161 0.93548387 0.80322581 0.85215054] mean value: 0.8671505376344086 key: train_roc_auc value: [0.87410072 0.88129496 0.88129496 0.87410072 0.86870504 0.87410072 0.87230216 0.86510791 0.8815077 0.87614425] mean value: 0.8748659137206364 key: test_jcc value: [0.85294118 0.71052632 0.74285714 0.83333333 0.82352941 0.77777778 0.66666667 0.875 0.67567568 0.73529412] mean value: 0.7693601617982423 key: train_jcc value: [0.78915663 0.78571429 0.78913738 0.77635783 0.76825397 0.77419355 0.77388535 0.7611465 0.78778135 0.77741935] mean value: 0.778304618898389 MCC on Blind test: 0.2 Accuracy on Blind test: 0.54 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00874949 0.00879979 0.0084486 0.00865364 0.00868559 0.00862646 0.00848246 0.00858378 0.00882339 0.00859857] mean value: 0.008645176887512207 key: score_time value: [0.00911641 0.00886369 0.00856209 0.00891066 0.00887418 0.00860906 0.00873876 0.00870085 0.00862622 0.008708 ] mean value: 0.008770990371704101 key: test_mcc value: [0.64820372 0.68313005 0.48488114 0.74348441 0.80813523 0.74348441 0.64820372 0.74193548 0.63978495 0.67204301] mean value: 0.6813286129520032 key: train_mcc value: [0.69129181 0.69623388 0.69785979 0.69872831 0.69209976 0.70569372 0.7019886 0.70220704 0.69929441 0.69881448] mean value: 0.698421180066379 key: test_accuracy value: [0.82258065 0.83870968 0.74193548 0.87096774 0.90322581 0.87096774 0.82258065 0.87096774 0.81967213 0.83606557] mean value: 0.8397673188789001 key: train_accuracy value: [0.84532374 0.8471223 0.84892086 0.84892086 0.84532374 0.85251799 0.85071942 0.85071942 0.8491921 0.8491921 ] mean value: 0.8487952546400941 key: test_fscore value: [0.81355932 0.84848485 0.75 0.875 0.9 0.875 0.83076923 0.87096774 0.81967213 0.83333333] mean value: 0.8416786607704336 key: train_fscore value: [0.84859155 0.85268631 0.84837545 0.85263158 0.85017422 0.8556338 0.85361552 0.85413005 0.85263158 0.85211268] mean value: 0.8520582734853629 key: test_precision value: [0.85714286 0.8 0.72727273 0.84848485 0.93103448 0.84848485 0.79411765 0.87096774 0.83333333 0.83333333] mean value: 0.8344171819804876 key: train_precision value: [0.83103448 0.82274247 0.85144928 0.83219178 0.82432432 0.83793103 0.83737024 0.83505155 0.83219178 0.83737024] mean value: 0.8341657184309065 key: test_recall value: [0.77419355 0.90322581 0.77419355 0.90322581 0.87096774 0.90322581 0.87096774 0.87096774 0.80645161 0.83333333] mean value: 0.8510752688172043 key: train_recall value: [0.86690647 0.88489209 0.84532374 0.87410072 0.87769784 0.87410072 0.8705036 0.87410072 0.87410072 0.86738351] mean value: 0.8709110131249839 key: test_roc_auc value: [0.82258065 0.83870968 0.74193548 0.87096774 0.90322581 0.87096774 0.82258065 0.87096774 0.81989247 0.83602151] mean value: 0.8397849462365592 key: train_roc_auc value: [0.84532374 0.8471223 0.84892086 0.84892086 0.84532374 0.85251799 0.85071942 0.85071942 0.84923674 0.84915938] mean value: 0.8487964467135968 key: test_jcc value: [0.68571429 0.73684211 0.6 0.77777778 0.81818182 0.77777778 0.71052632 0.77142857 0.69444444 0.71428571] mean value: 0.7286978810663021 key: train_jcc value: [0.73700306 0.74320242 0.73667712 0.74311927 0.73939394 0.74769231 0.74461538 0.74539877 0.74311927 0.74233129] mean value: 0.7422552816171282 MCC on Blind test: 0.19 Accuracy on Blind test: 0.49 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00824833 0.00824618 0.00818968 0.00823784 0.00803328 0.00808263 0.0080018 0.00826025 0.00804806 0.00802422] mean value: 0.008137226104736328 key: score_time value: [0.02001548 0.0169642 0.01295042 0.01175404 0.01527023 0.01145744 0.01146245 0.01176381 0.01168776 0.01165533] mean value: 0.01349811553955078 key: test_mcc value: [0.75623534 0.67741935 0.64820372 0.83914639 0.80813523 0.74193548 0.61418277 0.68313005 0.67204301 0.67721392] mean value: 0.7117645281572317 key: train_mcc value: [0.75664991 0.80977699 0.79501032 0.78789723 0.7814304 0.77770329 0.79138739 0.77342633 0.78180276 0.78587941] mean value: 0.7840964017204444 key: test_accuracy value: [0.87096774 0.83870968 0.82258065 0.91935484 0.90322581 0.87096774 0.80645161 0.83870968 0.83606557 0.83606557] mean value: 0.8543098889476468 key: train_accuracy value: [0.87769784 0.90467626 0.89748201 0.89388489 0.89028777 0.88848921 0.89568345 0.88669065 0.89048474 0.89228007] mean value: 0.8917656897821061 key: test_fscore value: [0.85714286 0.83870968 0.81355932 0.92063492 0.90625 0.87096774 0.8125 0.82758621 0.83870968 0.82142857] mean value: 0.8507488974910993 key: train_fscore value: [0.87407407 0.90310786 0.89692586 0.89292196 0.88766114 0.88602941 0.89605735 0.88607595 0.88766114 0.88929889] mean value: 0.8899813639558726 key: test_precision value: [0.96 0.83870968 0.85714286 0.90625 0.87878788 0.87096774 0.78787879 0.88888889 0.83870968 0.88461538] mean value: 0.8711950894087991 key: train_precision value: [0.90076336 0.91821561 0.90181818 0.9010989 0.90943396 0.90601504 0.89285714 0.89090909 0.90943396 0.91634981] mean value: 0.9046895060853061 key: test_recall value: [0.77419355 0.83870968 0.77419355 0.93548387 0.93548387 0.87096774 0.83870968 0.77419355 0.83870968 0.76666667] mean value: 0.8347311827956989 key: train_recall value: [0.84892086 0.88848921 0.89208633 0.88489209 0.86690647 0.86690647 0.89928058 0.88129496 0.86690647 0.86379928] mean value: 0.8759482736391532 key: test_roc_auc value: [0.87096774 0.83870968 0.82258065 0.91935484 0.90322581 0.87096774 0.80645161 0.83870968 0.83602151 0.83494624] mean value: 0.8541935483870968 key: train_roc_auc value: [0.87769784 0.90467626 0.89748201 0.89388489 0.89028777 0.88848921 0.89568345 0.88669065 0.89044248 0.8923313 ] mean value: 0.8917665867306155 key: test_jcc value: [0.75 0.72222222 0.68571429 0.85294118 0.82857143 0.77142857 0.68421053 0.70588235 0.72222222 0.6969697 ] mean value: 0.7420162482855981 key: train_jcc value: [0.77631579 0.82333333 0.81311475 0.80655738 0.79801325 0.79537954 0.81168831 0.79545455 0.79801325 0.80066445] mean value: 0.8018534590944678 MCC on Blind test: 0.17 Accuracy on Blind test: 0.56 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01743269 0.01715159 0.01687074 0.01602173 0.01673126 0.01642895 0.01583314 0.01826096 0.01568484 0.0176158 ] mean value: 0.016803169250488283 key: score_time value: [0.01035261 0.00926685 0.01012945 0.00933409 0.00936246 0.01025271 0.00945616 0.01029134 0.00932956 0.0092721 ] mean value: 0.00970473289489746 key: test_mcc value: [0.93548387 0.69047575 0.62471615 0.77784447 0.77784447 0.75623534 0.58338335 0.74348441 0.61090565 0.81062315] mean value: 0.7310996615906107 key: train_mcc value: [0.82186847 0.79485081 0.75204143 0.78877892 0.78485761 0.7611094 0.79209132 0.77560672 0.78260516 0.81085297] mean value: 0.7864662785132636 key: test_accuracy value: [0.96774194 0.83870968 0.80645161 0.88709677 0.88709677 0.87096774 0.79032258 0.87096774 0.80327869 0.90163934] mean value: 0.8624272871496562 key: train_accuracy value: [0.91007194 0.89568345 0.87230216 0.89208633 0.89028777 0.87769784 0.89388489 0.88489209 0.88868941 0.9048474 ] mean value: 0.8910443279128941 key: test_fscore value: [0.96774194 0.85294118 0.82352941 0.89230769 0.89230769 0.88235294 0.8 0.875 0.81818182 0.90625 ] mean value: 0.8710612667692839 key: train_fscore value: [0.91289199 0.90034364 0.88067227 0.89761092 0.8957265 0.88474576 0.8991453 0.89152542 0.89455782 0.90750436] mean value: 0.8964723986527141 key: test_precision value: [0.96774194 0.78378378 0.75675676 0.85294118 0.85294118 0.81081081 0.76470588 0.84848485 0.77142857 0.85294118] mean value: 0.8262536118513348 key: train_precision value: [0.88513514 0.86184211 0.82649842 0.8538961 0.8534202 0.83653846 0.85667752 0.84294872 0.8483871 0.88435374] mean value: 0.8549697504635009 key: test_recall value: [0.96774194 0.93548387 0.90322581 0.93548387 0.93548387 0.96774194 0.83870968 0.90322581 0.87096774 0.96666667] mean value: 0.9224731182795699 key: train_recall value: [0.94244604 0.94244604 0.94244604 0.94604317 0.94244604 0.93884892 0.94604317 0.94604317 0.94604317 0.93189964] mean value: 0.9424705396972745 key: test_roc_auc value: [0.96774194 0.83870968 0.80645161 0.88709677 0.88709677 0.87096774 0.79032258 0.87096774 0.80215054 0.90268817] mean value: 0.8624193548387098 key: train_roc_auc value: [0.91007194 0.89568345 0.87230216 0.89208633 0.89028777 0.87769784 0.89388489 0.88489209 0.88879219 0.90479874] mean value: 0.8910497408524792 key: test_jcc value: [0.9375 0.74358974 0.7 0.80555556 0.80555556 0.78947368 0.66666667 0.77777778 0.69230769 0.82857143] mean value: 0.7746998104234947 key: train_jcc value: [0.83974359 0.81875 0.78678679 0.81424149 0.81114551 0.79331307 0.81677019 0.80428135 0.80923077 0.83067093] mean value: 0.812493367099271 MCC on Blind test: 0.25 Accuracy on Blind test: 0.48 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.58698964 1.47472477 1.58772182 1.53978014 1.45300221 1.59102941 1.59130311 1.50179529 1.59061027 1.5674026 ] mean value: 1.548435926437378 key: score_time value: [0.01429367 0.01347637 0.01343799 0.0135057 0.01355076 0.01363134 0.01368833 0.01345825 0.01342821 0.01383781] mean value: 0.013630843162536621 key: test_mcc value: [0.96824584 0.84266484 0.87278605 0.93743687 0.93743687 0.90369611 0.90369611 0.90748521 0.83638369 0.8688172 ] mean value: 0.8978648796280239 key: train_mcc value: [0.99283145 0.98921503 0.98921503 0.98561151 0.98921503 0.98921503 0.98561151 0.99640932 0.99284416 0.99641577] mean value: 0.9906583855147647 key: test_accuracy value: [0.98387097 0.91935484 0.93548387 0.96774194 0.96774194 0.9516129 0.9516129 0.9516129 0.91803279 0.93442623] mean value: 0.9481491274457959 key: train_accuracy value: [0.99640288 0.99460432 0.99460432 0.99280576 0.99460432 0.99460432 0.99280576 0.99820144 0.99640934 0.99820467] mean value: 0.9953247097115845 key: test_fscore value: [0.98412698 0.92307692 0.9375 0.96875 0.96875 0.95081967 0.95238095 0.94915254 0.92063492 0.93333333] mean value: 0.9488525328057142 key: train_fscore value: [0.99638989 0.99459459 0.99459459 0.99280576 0.99459459 0.99459459 0.99280576 0.9981982 0.99638989 0.99820467] mean value: 0.9953172538625 key: test_precision value: [0.96875 0.88235294 0.90909091 0.93939394 0.93939394 0.96666667 0.9375 1. 0.90625 0.93333333] mean value: 0.9382731729055258 key: train_precision value: [1. 0.99638989 0.99638989 0.99280576 0.99638989 0.99638989 0.99280576 1. 1. 1. ] mean value: 0.997117107757837 key: test_recall value: [1. 0.96774194 0.96774194 1. 1. 0.93548387 0.96774194 0.90322581 0.93548387 0.93333333] mean value: 0.9610752688172043 key: train_recall value: [0.99280576 0.99280576 0.99280576 0.99280576 0.99280576 0.99280576 0.99280576 0.99640288 0.99280576 0.99641577] mean value: 0.9935264691472628 key: test_roc_auc value: [0.98387097 0.91935484 0.93548387 0.96774194 0.96774194 0.9516129 0.9516129 0.9516129 0.91774194 0.9344086 ] mean value: 0.9481182795698926 key: train_roc_auc value: [0.99640288 0.99460432 0.99460432 0.99280576 0.99460432 0.99460432 0.99280576 0.99820144 0.99640288 0.99820789] mean value: 0.9953243856527682 key: test_jcc value: [0.96875 0.85714286 0.88235294 0.93939394 0.93939394 0.90625 0.90909091 0.90322581 0.85294118 0.875 ] mean value: 0.9033541569120317 key: train_jcc value: [0.99280576 0.98924731 0.98924731 0.98571429 0.98924731 0.98924731 0.98571429 0.99640288 0.99280576 0.99641577] mean value: 0.9906847977838927 MCC on Blind test: 0.18 Accuracy on Blind test: 0.36 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01465797 0.01315546 0.01135731 0.01076937 0.01108599 0.01082325 0.01092386 0.01045871 0.01094151 0.01048708] mean value: 0.011466050148010254 key: score_time value: [0.01047611 0.00827646 0.00819159 0.00810289 0.00804043 0.00810003 0.00786495 0.00790858 0.00797296 0.00794983] mean value: 0.008288383483886719 key: test_mcc value: [0.96824584 0.90369611 1. 0.90748521 0.90369611 0.87831007 0.84266484 0.96824584 0.8688172 0.87055472] mean value: 0.9111715945488771 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98387097 0.9516129 1. 0.9516129 0.9516129 0.93548387 0.91935484 0.98387097 0.93442623 0.93442623] mean value: 0.9546271813855103 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98412698 0.95081967 1. 0.95384615 0.95081967 0.93103448 0.91525424 0.98360656 0.93548387 0.93103448] mean value: 0.9536026113385601 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96875 0.96666667 1. 0.91176471 0.96666667 1. 0.96428571 1. 0.93548387 0.96428571] mean value: 0.9677903338754856 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.93548387 1. 1. 0.93548387 0.87096774 0.87096774 0.96774194 0.93548387 0.9 ] mean value: 0.9416129032258065 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98387097 0.9516129 1. 0.9516129 0.9516129 0.93548387 0.91935484 0.98387097 0.9344086 0.93387097] mean value: 0.9545698924731183 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96875 0.90625 1. 0.91176471 0.90625 0.87096774 0.84375 0.96774194 0.87878788 0.87096774] mean value: 0.912523000402507 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.12 Accuracy on Blind test: 0.58 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.10264206 0.10256195 0.10940504 0.10670924 0.10698891 0.10749125 0.10634494 0.1044426 0.1049583 0.10701418] mean value: 0.10585584640502929 key: score_time value: [0.01734233 0.01776242 0.01870441 0.01858568 0.01841116 0.01849437 0.01723242 0.01806641 0.01844049 0.01706672] mean value: 0.018010640144348146 key: test_mcc value: [0.93743687 0.81325006 0.87096774 0.87278605 0.93743687 0.90369611 0.80645161 0.93743687 0.8688172 0.90215054] mean value: 0.8850429919540547 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96774194 0.90322581 0.93548387 0.93548387 0.96774194 0.9516129 0.90322581 0.96774194 0.93442623 0.95081967] mean value: 0.9417503966155474 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96875 0.90909091 0.93548387 0.9375 0.96875 0.95238095 0.90322581 0.96666667 0.93548387 0.95081967] mean value: 0.9428151748656772 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.93939394 0.85714286 0.93548387 0.90909091 0.93939394 0.9375 0.90322581 1. 0.93548387 0.93548387] mean value: 0.9292199064376484 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96774194 0.93548387 0.96774194 1. 0.96774194 0.90322581 0.93548387 0.93548387 0.96666667] mean value: 0.9579569892473119 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96774194 0.90322581 0.93548387 0.93548387 0.96774194 0.9516129 0.90322581 0.96774194 0.9344086 0.95107527] mean value: 0.9417741935483872 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.93939394 0.83333333 0.87878788 0.88235294 0.93939394 0.90909091 0.82352941 0.93548387 0.87878788 0.90625 ] mean value: 0.8926404102696797 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.22 Accuracy on Blind test: 0.4 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00817394 0.0080502 0.0084374 0.00861168 0.00831032 0.00796342 0.00770545 0.00855494 0.00841975 0.00792027] mean value: 0.008214735984802246 key: score_time value: [0.00823379 0.00851464 0.00845742 0.00857925 0.00831676 0.00783062 0.00798845 0.00863981 0.00799108 0.00859761] mean value: 0.008314943313598633 key: test_mcc value: [0.71004695 0.5809475 0.67883359 0.59603956 0.64549722 0.77784447 0.65372045 0.67883359 0.77072165 0.77096774] mean value: 0.68634527326083 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.85483871 0.79032258 0.83870968 0.79032258 0.82258065 0.88709677 0.82258065 0.83870968 0.8852459 0.8852459 ] mean value: 0.841565309360127 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85245902 0.79365079 0.83333333 0.76363636 0.81967213 0.88135593 0.80701754 0.83333333 0.88888889 0.8852459 ] mean value: 0.835859323808608 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.86666667 0.78125 0.86206897 0.875 0.83333333 0.92857143 0.88461538 0.86206897 0.875 0.87096774] mean value: 0.8639542486156779 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.83870968 0.80645161 0.80645161 0.67741935 0.80645161 0.83870968 0.74193548 0.80645161 0.90322581 0.9 ] mean value: 0.8125806451612902 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.85483871 0.79032258 0.83870968 0.79032258 0.82258065 0.88709677 0.82258065 0.83870968 0.88494624 0.88548387] mean value: 0.8415591397849462 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.74285714 0.65789474 0.71428571 0.61764706 0.69444444 0.78787879 0.67647059 0.71428571 0.8 0.79411765] mean value: 0.7199881834711557 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.07 Accuracy on Blind test: 0.43 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.35455632 1.36369705 1.44976234 1.43192887 1.36655641 1.41406369 1.44773722 1.37284899 1.38293886 1.38959265] mean value: 1.3973682403564454 key: score_time value: [0.09139943 0.09957314 0.09985614 0.09845757 0.09911156 0.0994699 0.09767675 0.09422445 0.09873199 0.09957123] mean value: 0.09780721664428711 key: test_mcc value: [0.96824584 0.93548387 0.96824584 0.90748521 0.93743687 0.96824584 1. 0.96824584 0.96770777 0.8688172 ] mean value: 0.9489914273704848 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98387097 0.96774194 0.98387097 0.9516129 0.96774194 0.98387097 1. 0.98387097 0.98360656 0.93442623] mean value: 0.9740613432046537 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98412698 0.96774194 0.98412698 0.95384615 0.96875 0.98360656 1. 0.98360656 0.98412698 0.93333333] mean value: 0.9743265489798408 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96875 0.96774194 0.96875 0.91176471 0.93939394 1. 1. 1. 0.96875 0.93333333] mean value: 0.9658483914093496 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96774194 1. 1. 1. 0.96774194 1. 0.96774194 1. 0.93333333] mean value: 0.9836559139784946 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98387097 0.96774194 0.98387097 0.9516129 0.96774194 0.98387097 1. 0.98387097 0.98333333 0.9344086 ] mean value: 0.9740322580645162 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96875 0.9375 0.96875 0.91176471 0.93939394 0.96774194 1. 0.96774194 0.96875 0.875 ] mean value: 0.9505392516244034 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.16 Accuracy on Blind test: 0.35 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.87023854 0.93618393 0.92621827 0.95837045 1.00464082 0.93836141 0.98241544 0.91587925 0.8986578 0.98821497] mean value: 0.9419180870056152 key: score_time value: [0.23300123 0.2598815 0.26490426 0.22142696 0.22671819 0.23441744 0.25722957 0.27357078 0.23566699 0.21245193] mean value: 0.2419268846511841 key: test_mcc value: [0.96824584 0.87278605 0.93743687 0.90748521 0.93743687 0.96824584 0.96824584 0.96824584 0.93635873 0.83655914] mean value: 0.9301046213212982 key: train_mcc value: [0.96778244 0.97132357 0.96778244 0.97487691 0.96768225 0.96768225 0.96778244 0.96778244 0.97137553 0.9784809 ] mean value: 0.9702551166949516 key: test_accuracy value: [0.98387097 0.93548387 0.96774194 0.9516129 0.96774194 0.98387097 0.98387097 0.98387097 0.96721311 0.91803279] mean value: 0.9643310417768377 key: train_accuracy value: [0.98381295 0.98561151 0.98381295 0.98741007 0.98381295 0.98381295 0.98381295 0.98381295 0.98563734 0.98922801] mean value: 0.9850764630665306 key: test_fscore value: [0.98412698 0.9375 0.96875 0.95384615 0.96875 0.98360656 0.98412698 0.98360656 0.96875 0.91803279] mean value: 0.9651096023739466 key: train_fscore value: [0.98395722 0.98571429 0.98395722 0.98747764 0.98389982 0.98389982 0.98395722 0.98395722 0.98571429 0.98928571] mean value: 0.985182044357831 key: test_precision value: [0.96875 0.90909091 0.93939394 0.91176471 0.93939394 1. 0.96875 1. 0.93939394 0.90322581] mean value: 0.9479763239606693 key: train_precision value: [0.97526502 0.9787234 0.97526502 0.98220641 0.97864769 0.97864769 0.97526502 0.97526502 0.9787234 0.98576512] mean value: 0.9783773783096608 key: test_recall value: [1. 0.96774194 1. 1. 1. 0.96774194 1. 0.96774194 1. 0.93333333] mean value: 0.9836559139784946 key: train_recall value: [0.99280576 0.99280576 0.99280576 0.99280576 0.98920863 0.98920863 0.99280576 0.99280576 0.99280576 0.99283154] mean value: 0.9920889095175472 key: test_roc_auc value: [0.98387097 0.93548387 0.96774194 0.9516129 0.96774194 0.98387097 0.98387097 0.98387097 0.96666667 0.91827957] mean value: 0.9643010752688173 key: train_roc_auc value: [0.98381295 0.98561151 0.98381295 0.98741007 0.98381295 0.98381295 0.98381295 0.98381295 0.98565019 0.98922153] mean value: 0.9850770996106342 key: test_jcc value: [0.96875 0.88235294 0.93939394 0.91176471 0.93939394 0.96774194 0.96875 0.96774194 0.93939394 0.84848485] mean value: 0.9333768184693232 key: train_jcc value: [0.96842105 0.97183099 0.96842105 0.97526502 0.96830986 0.96830986 0.96842105 0.96842105 0.97183099 0.97879859] mean value: 0.9708029504907444 MCC on Blind test: 0.15 Accuracy on Blind test: 0.4 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01861191 0.00832939 0.00832176 0.00824809 0.00866604 0.00814605 0.00879598 0.00842834 0.00795603 0.00822353] mean value: 0.009372711181640625 key: score_time value: [0.00951219 0.00824142 0.00890613 0.00865459 0.00877047 0.00828552 0.00873399 0.00808811 0.00830865 0.00846767] mean value: 0.00859687328338623 key: test_mcc value: [0.64820372 0.68313005 0.48488114 0.74348441 0.80813523 0.74348441 0.64820372 0.74193548 0.63978495 0.67204301] mean value: 0.6813286129520032 key: train_mcc value: [0.69129181 0.69623388 0.69785979 0.69872831 0.69209976 0.70569372 0.7019886 0.70220704 0.69929441 0.69881448] mean value: 0.698421180066379 key: test_accuracy value: [0.82258065 0.83870968 0.74193548 0.87096774 0.90322581 0.87096774 0.82258065 0.87096774 0.81967213 0.83606557] mean value: 0.8397673188789001 key: train_accuracy value: [0.84532374 0.8471223 0.84892086 0.84892086 0.84532374 0.85251799 0.85071942 0.85071942 0.8491921 0.8491921 ] mean value: 0.8487952546400941 key: test_fscore value: [0.81355932 0.84848485 0.75 0.875 0.9 0.875 0.83076923 0.87096774 0.81967213 0.83333333] mean value: 0.8416786607704336 key: train_fscore value: [0.84859155 0.85268631 0.84837545 0.85263158 0.85017422 0.8556338 0.85361552 0.85413005 0.85263158 0.85211268] mean value: 0.8520582734853629 key: test_precision value: [0.85714286 0.8 0.72727273 0.84848485 0.93103448 0.84848485 0.79411765 0.87096774 0.83333333 0.83333333] mean value: 0.8344171819804876 key: train_precision value: [0.83103448 0.82274247 0.85144928 0.83219178 0.82432432 0.83793103 0.83737024 0.83505155 0.83219178 0.83737024] mean value: 0.8341657184309065 key: test_recall value: [0.77419355 0.90322581 0.77419355 0.90322581 0.87096774 0.90322581 0.87096774 0.87096774 0.80645161 0.83333333] mean value: 0.8510752688172043 key: train_recall value: [0.86690647 0.88489209 0.84532374 0.87410072 0.87769784 0.87410072 0.8705036 0.87410072 0.87410072 0.86738351] mean value: 0.8709110131249839 key: test_roc_auc value: [0.82258065 0.83870968 0.74193548 0.87096774 0.90322581 0.87096774 0.82258065 0.87096774 0.81989247 0.83602151] mean value: 0.8397849462365592 key: train_roc_auc value: [0.84532374 0.8471223 0.84892086 0.84892086 0.84532374 0.85251799 0.85071942 0.85071942 0.84923674 0.84915938] mean value: 0.8487964467135968 key: test_jcc value: [0.68571429 0.73684211 0.6 0.77777778 0.81818182 0.77777778 0.71052632 0.77142857 0.69444444 0.71428571] mean value: 0.7286978810663021 key: train_jcc value: [0.73700306 0.74320242 0.73667712 0.74311927 0.73939394 0.74769231 0.74461538 0.74539877 0.74311927 0.74233129] mean value: 0.7422552816171282 MCC on Blind test: 0.19 Accuracy on Blind test: 0.49 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.08428884 0.04923701 0.12757492 0.1029861 0.05474067 0.05481815 0.06141877 0.06270385 0.06345892 0.05934381] mean value: 0.07205710411071778 key: score_time value: [0.01002645 0.00963044 0.01171899 0.01000237 0.00956392 0.00953889 0.00953102 0.00952125 0.00951862 0.00952578] mean value: 0.009857773780822754 key: test_mcc value: [0.96824584 0.90369611 0.93743687 0.90748521 0.90369611 0.93743687 1. 0.96824584 0.96770777 0.8688172 ] mean value: 0.9362767824424617 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98387097 0.9516129 0.96774194 0.9516129 0.9516129 0.96774194 1. 0.98387097 0.98360656 0.93442623] mean value: 0.9676097303014278 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98412698 0.95238095 0.96875 0.95384615 0.95238095 0.96666667 1. 0.98360656 0.98412698 0.93333333] mean value: 0.9679218584239075 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96875 0.9375 0.93939394 0.91176471 0.9375 1. 1. 1. 0.96875 0.93333333] mean value: 0.9596991978609626 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96774194 1. 1. 0.96774194 0.93548387 1. 0.96774194 1. 0.93333333] mean value: 0.9772043010752688 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98387097 0.9516129 0.96774194 0.9516129 0.9516129 0.96774194 1. 0.98387097 0.98333333 0.9344086 ] mean value: 0.9675806451612904 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96875 0.90909091 0.93939394 0.91176471 0.90909091 0.93548387 1. 0.96774194 0.96875 0.875 ] mean value: 0.9385066269909723 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.1 Accuracy on Blind test: 0.61 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.01458907 0.04201937 0.02599144 0.01775765 0.04186487 0.0279336 0.01842332 0.04155302 0.04179025 0.01767874] mean value: 0.028960132598876955 key: score_time value: [0.01030087 0.02038527 0.01068902 0.01067185 0.01916838 0.01074195 0.01076746 0.01074457 0.02005053 0.010741 ] mean value: 0.013426089286804199 key: test_mcc value: [0.96824584 0.87278605 1. 0.90748521 0.96824584 0.96824584 1. 0.93743687 0.87082935 0.83655914] mean value: 0.9329834129888399 key: train_mcc value: [0.95329292 0.9497386 0.95329292 0.96048758 0.94966486 0.94966486 0.93900081 0.95339163 0.95693712 0.96065614] mean value: 0.9526127442796535 key: test_accuracy value: [0.98387097 0.93548387 1. 0.9516129 0.98387097 0.98387097 1. 0.96774194 0.93442623 0.91803279] mean value: 0.9658910629296669 key: train_accuracy value: [0.97661871 0.97482014 0.97661871 0.98021583 0.97482014 0.97482014 0.96942446 0.97661871 0.97845601 0.98025135] mean value: 0.9762664195394134 key: test_fscore value: [0.98360656 0.9375 1. 0.95384615 0.98412698 0.98360656 1. 0.96666667 0.93333333 0.91803279] mean value: 0.9660719039612482 key: train_fscore value: [0.97674419 0.975 0.97674419 0.980322 0.97491039 0.97491039 0.96969697 0.97682709 0.97849462 0.98046181] mean value: 0.9764111663751257 key: test_precision value: [1. 0.90909091 1. 0.91176471 0.96875 1. 1. 1. 0.96551724 0.90322581] mean value: 0.9658348662804185 key: train_precision value: [0.97153025 0.96808511 0.97153025 0.97508897 0.97142857 0.97142857 0.96113074 0.96819788 0.975 0.97183099] mean value: 0.9705251323255912 key: test_recall value: [0.96774194 0.96774194 1. 1. 1. 0.96774194 1. 0.93548387 0.90322581 0.93333333] mean value: 0.9675268817204301 key: train_recall value: [0.98201439 0.98201439 0.98201439 0.98561151 0.97841727 0.97841727 0.97841727 0.98561151 0.98201439 0.98924731] mean value: 0.9823779685928676 key: test_roc_auc value: [0.98387097 0.93548387 1. 0.9516129 0.98387097 0.98387097 1. 0.96774194 0.93494624 0.91827957] mean value: 0.9659677419354838 key: train_roc_auc value: [0.97661871 0.97482014 0.97661871 0.98021583 0.97482014 0.97482014 0.96942446 0.97661871 0.97846239 0.98023517] mean value: 0.976265439261494 key: test_jcc value: [0.96774194 0.88235294 1. 0.91176471 0.96875 0.96774194 1. 0.93548387 0.875 0.84848485] mean value: 0.9357320237479156 key: train_jcc value: [0.95454545 0.95121951 0.95454545 0.96140351 0.95104895 0.95104895 0.94117647 0.95470383 0.95789474 0.96167247] mean value: 0.9539259346206412 MCC on Blind test: 0.17 Accuracy on Blind test: 0.38 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02236176 0.00778937 0.00771594 0.00752807 0.0074892 0.00744605 0.00749993 0.007586 0.00749612 0.00748873] mean value: 0.009040117263793945 key: score_time value: [0.01843238 0.00818586 0.00802255 0.00780058 0.00774455 0.00785375 0.00774026 0.00784397 0.00779438 0.00780678] mean value: 0.008922505378723144 key: test_mcc value: [0.77459667 0.65372045 0.55301004 0.74819006 0.74819006 0.7190925 0.58338335 0.77459667 0.57576971 0.81062315] mean value: 0.6941172654572817 key: train_mcc value: [0.70194087 0.71536572 0.73033254 0.70140848 0.70140848 0.70194087 0.72031981 0.70528679 0.72419371 0.70094494] mean value: 0.7103142230314713 key: test_accuracy value: [0.88709677 0.82258065 0.77419355 0.87096774 0.87096774 0.85483871 0.79032258 0.88709677 0.78688525 0.90163934] mean value: 0.8446589106292967 key: train_accuracy value: [0.84892086 0.85611511 0.86330935 0.84892086 0.84892086 0.84892086 0.85791367 0.85071942 0.85996409 0.8491921 ] mean value: 0.8532897201090115 key: test_fscore value: [0.8852459 0.8358209 0.78787879 0.87878788 0.87878788 0.86567164 0.8 0.88888889 0.8 0.90625 ] mean value: 0.8527331873296211 key: train_fscore value: [0.85665529 0.86254296 0.86986301 0.85616438 0.85616438 0.85665529 0.86541738 0.85811966 0.8668942 0.8556701 ] mean value: 0.8604146652008448 key: test_precision value: [0.9 0.77777778 0.74285714 0.82857143 0.82857143 0.80555556 0.76470588 0.875 0.76470588 0.85294118] mean value: 0.8140686274509804 key: train_precision value: [0.81493506 0.82565789 0.83006536 0.81699346 0.81699346 0.81493506 0.82200647 0.81758958 0.82467532 0.82178218] mean value: 0.8205633864120958 key: test_recall value: [0.87096774 0.90322581 0.83870968 0.93548387 0.93548387 0.93548387 0.83870968 0.90322581 0.83870968 0.96666667] mean value: 0.8966666666666666 key: train_recall value: [0.9028777 0.9028777 0.91366906 0.89928058 0.89928058 0.9028777 0.91366906 0.9028777 0.91366906 0.89247312] mean value: 0.9043552254970217 key: test_roc_auc value: [0.88709677 0.82258065 0.77419355 0.87096774 0.87096774 0.85483871 0.79032258 0.88709677 0.78602151 0.90268817] mean value: 0.8446774193548388 key: train_roc_auc value: [0.84892086 0.85611511 0.86330935 0.84892086 0.84892086 0.84892086 0.85791367 0.85071942 0.86006034 0.84911426] mean value: 0.853291560300147 key: test_jcc value: [0.79411765 0.71794872 0.65 0.78378378 0.78378378 0.76315789 0.66666667 0.8 0.66666667 0.82857143] mean value: 0.7454696589216713 key: train_jcc value: [0.74925373 0.75830816 0.76969697 0.74850299 0.74850299 0.74925373 0.76276276 0.75149701 0.76506024 0.74774775] mean value: 0.7550586334969577 MCC on Blind test: 0.2 Accuracy on Blind test: 0.48 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01076055 0.01259518 0.01484299 0.0152657 0.01288152 0.01513839 0.01497507 0.01241827 0.01552725 0.01354051] mean value: 0.013794541358947754 key: score_time value: [0.00853276 0.01013088 0.01017213 0.01044273 0.01037955 0.01046228 0.01040554 0.01038742 0.01037264 0.01043701] mean value: 0.010172295570373534 key: test_mcc value: [0.93743687 0.81325006 0.84983659 0.87831007 0.93548387 0.96824584 0.93743687 0.90748521 0.87082935 0.70997538] mean value: 0.8808290098706804 key: train_mcc value: [0.89396219 0.81804143 0.8410572 0.96058703 0.93914669 0.95329292 0.9354697 0.94266562 0.95337563 0.78144333] mean value: 0.9019041746413544 key: test_accuracy value: [0.96774194 0.90322581 0.91935484 0.93548387 0.96774194 0.98387097 0.96774194 0.9516129 0.93442623 0.83606557] mean value: 0.9367265996827076 key: train_accuracy value: [0.94604317 0.9028777 0.91546763 0.98021583 0.96942446 0.97661871 0.9676259 0.97122302 0.97666068 0.88150808] mean value: 0.9487665164098523 key: test_fscore value: [0.96666667 0.89655172 0.92537313 0.93939394 0.96774194 0.98360656 0.96666667 0.94915254 0.93333333 0.8 ] mean value: 0.9328486499760696 key: train_fscore value: [0.94423792 0.89370079 0.92153589 0.98039216 0.96903461 0.97674419 0.96727273 0.97153025 0.97649186 0.86746988] mean value: 0.9468410268529506 key: test_precision value: [1. 0.96296296 0.86111111 0.88571429 0.96774194 1. 1. 1. 0.96551724 1. ] mean value: 0.9643047536651541 key: train_precision value: [0.97692308 0.98695652 0.85981308 0.97173145 0.98154982 0.97153025 0.97794118 0.96126761 0.98181818 0.98630137] mean value: 0.9655832529931669 key: test_recall value: [0.93548387 0.83870968 1. 1. 0.96774194 0.96774194 0.93548387 0.90322581 0.90322581 0.66666667] mean value: 0.9118279569892473 key: train_recall value: [0.91366906 0.81654676 0.99280576 0.98920863 0.95683453 0.98201439 0.95683453 0.98201439 0.97122302 0.77419355] mean value: 0.9335344627523787 key: test_roc_auc value: [0.96774194 0.90322581 0.91935484 0.93548387 0.96774194 0.98387097 0.96774194 0.9516129 0.93494624 0.83333333] mean value: 0.936505376344086 key: train_roc_auc value: [0.94604317 0.9028777 0.91546763 0.98021583 0.96942446 0.97661871 0.9676259 0.97122302 0.97665094 0.88170109] mean value: 0.9487848430932674 key: test_jcc value: [0.93548387 0.8125 0.86111111 0.88571429 0.9375 0.96774194 0.93548387 0.90322581 0.875 0.66666667] mean value: 0.8780427547363031 key: train_jcc value: [0.8943662 0.80782918 0.85448916 0.96153846 0.93992933 0.95454545 0.93661972 0.94463668 0.9540636 0.76595745] mean value: 0.9013975235029617 MCC on Blind test: 0.13 Accuracy on Blind test: 0.31 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.0138278 0.01442528 0.01401639 0.01388168 0.01454473 0.0136025 0.0134716 0.01331687 0.01288104 0.01246238] mean value: 0.013643026351928711 key: score_time value: [0.01061082 0.01157665 0.01069307 0.01073432 0.01059294 0.01056838 0.01065302 0.01038933 0.01044464 0.01039839] mean value: 0.010666155815124511 key: test_mcc value: [0.93743687 0.87278605 0.93743687 0.90748521 0.90369611 0.87831007 1. 0.78446454 0.72318666 0.50305191] mean value: 0.8447854282682599 key: train_mcc value: [0.90882979 0.95705746 0.95025527 0.94305636 0.92239227 0.89154571 0.94604929 0.77463214 0.83507476 0.45405525] mean value: 0.858294830889454 key: test_accuracy value: [0.96774194 0.93548387 0.96774194 0.9516129 0.9516129 0.93548387 1. 0.88709677 0.85245902 0.70491803] mean value: 0.9154151242728715 key: train_accuracy value: [0.95323741 0.97841727 0.97482014 0.97122302 0.96043165 0.9442446 0.97302158 0.87589928 0.91202873 0.67504488] mean value: 0.9218368572646372 key: test_fscore value: [0.96666667 0.9375 0.96875 0.95384615 0.95081967 0.93103448 1. 0.89552239 0.86956522 0.57142857] mean value: 0.9045133152282165 key: train_fscore value: [0.95149254 0.97864769 0.97526502 0.97069597 0.95925926 0.94183865 0.97307002 0.88924559 0.91846922 0.52493438] mean value: 0.908291832592524 key: test_precision value: [1. 0.90909091 0.93939394 0.91176471 0.96666667 1. 1. 0.83333333 0.78947368 1. ] mean value: 0.9349723238577727 key: train_precision value: [0.98837209 0.96830986 0.95833333 0.98880597 0.98854962 0.98431373 0.97132616 0.80289855 0.85448916 0.98039216] mean value: 0.9485790636020202 key: test_recall value: [0.93548387 0.96774194 1. 1. 0.93548387 0.87096774 1. 0.96774194 0.96774194 0.4 ] mean value: 0.9045161290322581 key: train_recall value: [0.91726619 0.98920863 0.99280576 0.95323741 0.93165468 0.9028777 0.97482014 0.99640288 0.99280576 0.35842294] mean value: 0.9009502075758747 key: test_roc_auc value: [0.96774194 0.93548387 0.96774194 0.9516129 0.9516129 0.93548387 1. 0.88709677 0.85053763 0.7 ] mean value: 0.914731182795699 key: train_roc_auc value: [0.95323741 0.97841727 0.97482014 0.97122302 0.96043165 0.9442446 0.97302158 0.87589928 0.91217349 0.67561435] mean value: 0.9219082798277507 key: test_jcc value: [0.93548387 0.88235294 0.93939394 0.91176471 0.90625 0.87096774 1. 0.81081081 0.76923077 0.4 ] mean value: 0.8426254779397568 key: train_jcc value: [0.90747331 0.95818815 0.95172414 0.9430605 0.92170819 0.89007092 0.94755245 0.80057803 0.84923077 0.35587189] mean value: 0.8525458343695811 MCC on Blind test: 0.14 Accuracy on Blind test: 0.32 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.11406326 0.10405397 0.10169864 0.10252857 0.09923482 0.10144997 0.09957933 0.10238481 0.10498977 0.10294104] mean value: 0.10329241752624511 key: score_time value: [0.01416016 0.01535344 0.01559019 0.01440263 0.01463914 0.01422262 0.01545978 0.01572537 0.01503325 0.0141983 ] mean value: 0.014878487586975098 key: test_mcc value: [0.96824584 0.93548387 0.96824584 0.90748521 0.90748521 0.93743687 1. 0.90369611 1. 0.8688172 ] mean value: 0.9396896154994742 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98387097 0.96774194 0.98387097 0.9516129 0.9516129 0.96774194 1. 0.9516129 1. 0.93442623] mean value: 0.9692490745637229 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98412698 0.96774194 0.98360656 0.95384615 0.95384615 0.96666667 1. 0.95081967 1. 0.93333333] mean value: 0.9693987456811359 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96875 0.96774194 1. 0.91176471 0.91176471 1. 1. 0.96666667 1. 0.93333333] mean value: 0.9660021347248577 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96774194 0.96774194 1. 1. 0.93548387 1. 0.93548387 1. 0.93333333] mean value: 0.9739784946236559 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98387097 0.96774194 0.98387097 0.9516129 0.9516129 0.96774194 1. 0.9516129 1. 0.9344086 ] mean value: 0.969247311827957 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96875 0.9375 0.96774194 0.91176471 0.91176471 0.93548387 1. 0.90625 1. 0.875 ] mean value: 0.9414255218216319 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.09 Accuracy on Blind test: 0.31 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.03851056 0.03913617 0.03798318 0.04739237 0.0397296 0.04984927 0.04227161 0.05067539 0.05400753 0.04927731] mean value: 0.04488329887390137 key: score_time value: [0.02179551 0.02289391 0.02226377 0.01712132 0.03155065 0.0246129 0.03463507 0.02148271 0.02362227 0.01659489] mean value: 0.02365729808807373 key: test_mcc value: [1. 0.90369611 1. 0.93743687 0.87096774 0.90748521 0.83914639 0.96824584 0.93635873 0.90204573] mean value: 0.9265382629263172 key: train_mcc value: [0.99640932 0.99640932 0.99280576 0.99640932 0.98563702 0.99280576 0.99640932 0.99640932 0.98923442 0.99284434] mean value: 0.9935373910332435 key: test_accuracy value: [1. 0.9516129 1. 0.96774194 0.93548387 0.9516129 0.91935484 0.98387097 0.96721311 0.95081967] mean value: 0.9627710206240084 key: train_accuracy value: [0.99820144 0.99820144 0.99640288 0.99820144 0.99280576 0.99640288 0.99820144 0.99820144 0.994614 0.99640934] mean value: 0.9967642044353745 key: test_fscore value: [1. 0.95081967 1. 0.96875 0.93548387 0.94915254 0.91803279 0.98360656 0.96875 0.94915254] mean value: 0.9623747972106947 key: train_fscore value: [0.9981982 0.9981982 0.99640288 0.9981982 0.99277978 0.99640288 0.99820467 0.9981982 0.994614 0.99640288] mean value: 0.9967599880734039 key: test_precision value: [1. 0.96666667 1. 0.93939394 0.93548387 1. 0.93333333 1. 0.93939394 0.96551724] mean value: 0.9679788991134931 key: train_precision value: [1. 1. 0.99640288 1. 0.99637681 0.99640288 0.99641577 1. 0.99283154 1. ] mean value: 0.9978429878817844 key: test_recall value: [1. 0.93548387 1. 1. 0.93548387 0.90322581 0.90322581 0.96774194 1. 0.93333333] mean value: 0.9578494623655914 key: train_recall value: [0.99640288 0.99640288 0.99640288 0.99640288 0.98920863 0.99640288 1. 0.99640288 0.99640288 0.99283154] mean value: 0.9956860318197055 key: test_roc_auc value: [1. 0.9516129 1. 0.96774194 0.93548387 0.9516129 0.91935484 0.98387097 0.96666667 0.95053763] mean value: 0.9626881720430108 key: train_roc_auc value: [0.99820144 0.99820144 0.99640288 0.99820144 0.99280576 0.99640288 0.99820144 0.99820144 0.99461721 0.99641577] mean value: 0.996765168510353 key: test_jcc value: [1. 0.90625 1. 0.93939394 0.87878788 0.90322581 0.84848485 0.96774194 0.93939394 0.90322581] mean value: 0.9286504154447703 key: train_jcc value: [0.99640288 0.99640288 0.99283154 0.99640288 0.98566308 0.99283154 0.99641577 0.99640288 0.98928571 0.99283154] mean value: 0.9935470701779591 MCC on Blind test: 0.07 Accuracy on Blind test: 0.58 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.12759042 0.22521901 0.21887374 0.2211132 0.17997479 0.20122313 0.19672465 0.20488429 0.276335 0.25733685] mean value: 0.21092751026153564 key: score_time value: [0.01269174 0.02497721 0.02092695 0.02029276 0.01257658 0.0126636 0.01265192 0.02021074 0.02772164 0.02012014] mean value: 0.0184833288192749 key: test_mcc value: [0.90748521 0.61807005 0.7130241 0.80813523 0.77784447 0.77459667 0.61807005 0.80645161 0.57576971 0.70780713] mean value: 0.7307254226729265 key: train_mcc value: [0.87086426 0.86386843 0.84312418 0.83904739 0.85318614 0.85376169 0.85720277 0.84009387 0.86412027 0.86022912] mean value: 0.8545498119930119 key: test_accuracy value: [0.9516129 0.80645161 0.85483871 0.90322581 0.88709677 0.88709677 0.80645161 0.90322581 0.78688525 0.85245902] mean value: 0.8639344262295082 key: train_accuracy value: [0.9352518 0.93165468 0.92086331 0.91906475 0.92625899 0.92625899 0.92805755 0.91906475 0.93177738 0.92998205] mean value: 0.9268234245637601 key: test_fscore value: [0.94915254 0.81818182 0.86153846 0.90625 0.89230769 0.88888889 0.81818182 0.90322581 0.8 0.84210526] mean value: 0.8679832291081068 key: train_fscore value: [0.93617021 0.93286219 0.92307692 0.92091388 0.92768959 0.92819615 0.92982456 0.92173913 0.93286219 0.93097345] mean value: 0.9284308286107671 key: test_precision value: [1. 0.77142857 0.82352941 0.87878788 0.85294118 0.875 0.77142857 0.90322581 0.76470588 0.88888889] mean value: 0.8529936187573759 key: train_precision value: [0.92307692 0.91666667 0.89795918 0.90034364 0.9100346 0.90443686 0.90753425 0.89225589 0.91666667 0.91958042] mean value: 0.9088555103251448 key: test_recall value: [0.90322581 0.87096774 0.90322581 0.93548387 0.93548387 0.90322581 0.87096774 0.90322581 0.83870968 0.8 ] mean value: 0.8864516129032258 key: train_recall value: [0.94964029 0.94964029 0.94964029 0.94244604 0.94604317 0.95323741 0.95323741 0.95323741 0.94964029 0.94265233] mean value: 0.9489414919677162 key: test_roc_auc value: [0.9516129 0.80645161 0.85483871 0.90322581 0.88709677 0.88709677 0.80645161 0.90322581 0.78602151 0.8516129 ] mean value: 0.8637634408602151 key: train_roc_auc value: [0.9352518 0.93165468 0.92086331 0.91906475 0.92625899 0.92625899 0.92805755 0.91906475 0.93180939 0.92995926] mean value: 0.9268243469740336 key: test_jcc value: [0.90322581 0.69230769 0.75675676 0.82857143 0.80555556 0.8 0.69230769 0.82352941 0.66666667 0.72727273] mean value: 0.7696193737654838 key: train_jcc value: [0.88 0.87417219 0.85714286 0.8534202 0.86513158 0.86601307 0.86885246 0.85483871 0.87417219 0.87086093] mean value: 0.8664604170132448 MCC on Blind test: 0.22 Accuracy on Blind test: 0.49 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.26798725 0.26752782 0.26489639 0.26649332 0.25984526 0.26162148 0.2612102 0.26817322 0.26268578 0.26635146] mean value: 0.26467921733856203 key: score_time value: [0.00845337 0.00842595 0.00839472 0.0083878 0.00851393 0.00833416 0.00913382 0.00835061 0.00875974 0.00896358] mean value: 0.008571767807006836 key: test_mcc value: [1. 0.90369611 1. 0.93743687 0.93743687 0.90748521 0.96824584 0.96824584 0.96770777 0.8688172 ] mean value: 0.9459071710309553 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.9516129 1. 0.96774194 0.96774194 0.9516129 0.98387097 0.98387097 0.98360656 0.93442623] mean value: 0.9724484399788472 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.95081967 1. 0.96875 0.96875 0.94915254 0.98412698 0.98360656 0.98412698 0.93333333] mean value: 0.972266607346838 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.96666667 1. 0.93939394 0.93939394 1. 0.96875 1. 0.96875 0.93333333] mean value: 0.9716287878787879 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.93548387 1. 1. 1. 0.90322581 1. 0.96774194 1. 0.93333333] mean value: 0.9739784946236559 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.9516129 1. 0.96774194 0.96774194 0.9516129 0.98387097 0.98387097 0.98333333 0.9344086 ] mean value: 0.9724193548387097 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.90625 1. 0.93939394 0.93939394 0.90322581 0.96875 0.96774194 0.96875 0.875 ] mean value: 0.9468505620723363 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.14 Accuracy on Blind test: 0.63 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.01149559 0.01360273 0.01408195 0.01396298 0.0143764 0.01411986 0.01366615 0.01368761 0.01428699 0.01439714] mean value: 0.013767743110656738 key: score_time value: [0.01090598 0.01095629 0.01090288 0.01166439 0.01109648 0.01160717 0.01097107 0.01158404 0.01094365 0.01160645] mean value: 0.011223840713500976 key: test_mcc value: [0.3799803 0.51119863 0.54006172 0.74161985 0.56853524 0.56493268 0.50083542 0.43852901 0.72318666 0.76533557] mean value: 0.5734215093600435 key: train_mcc value: [0.4932785 0.76196204 0.69278522 0.72409686 0.56120987 0.54686874 0.76885315 0.49611447 0.76738608 0.73356387] mean value: 0.6546118797369623 key: test_accuracy value: [0.64516129 0.74193548 0.72580645 0.85483871 0.75806452 0.74193548 0.74193548 0.66129032 0.85245902 0.86885246] mean value: 0.759227921734532 key: train_accuracy value: [0.69784173 0.87230216 0.82553957 0.8471223 0.75359712 0.73021583 0.87410072 0.69964029 0.87791741 0.85098743] mean value: 0.8029264559626984 key: test_fscore value: [0.73170732 0.77777778 0.78481013 0.87323944 0.69387755 0.79487179 0.77142857 0.74698795 0.86956522 0.88235294] mean value: 0.7926618685748723 key: train_fscore value: [0.76731302 0.88455285 0.85099846 0.86614173 0.68649886 0.78753541 0.88709677 0.76837725 0.88741722 0.87010955] mean value: 0.8256041120420929 key: test_precision value: [0.58823529 0.68292683 0.64583333 0.775 0.94444444 0.65957447 0.69230769 0.59615385 0.78947368 0.78947368] mean value: 0.7163423276131415 key: train_precision value: [0.62387387 0.80712166 0.74262735 0.77030812 0.94339623 0.64953271 0.80409357 0.62528217 0.82208589 0.77222222] mean value: 0.756054378747134 key: test_recall value: [0.96774194 0.90322581 1. 1. 0.5483871 1. 0.87096774 1. 0.96774194 1. ] mean value: 0.9258064516129032 key: train_recall value: [0.99640288 0.97841727 0.99640288 0.98920863 0.53956835 1. 0.98920863 0.99640288 0.96402878 0.99641577] mean value: 0.9446056058379103 key: test_roc_auc value: [0.64516129 0.74193548 0.72580645 0.85483871 0.75806452 0.74193548 0.74193548 0.66129032 0.85053763 0.87096774] mean value: 0.759247311827957 key: train_roc_auc value: [0.69784173 0.87230216 0.82553957 0.8471223 0.75359712 0.73021583 0.87410072 0.69964029 0.87807174 0.85072587] mean value: 0.8029157319305846 key: test_jcc value: [0.57692308 0.63636364 0.64583333 0.775 0.53125 0.65957447 0.62790698 0.59615385 0.76923077 0.78947368] mean value: 0.660770979104448 key: train_jcc value: [0.62247191 0.79300292 0.74064171 0.76388889 0.52264808 0.64953271 0.79710145 0.62387387 0.79761905 0.7700831 ] mean value: 0.7080863692848516 MCC on Blind test: 0.18 Accuracy on Blind test: 0.6 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.02095389 0.03017879 0.03077292 0.03034782 0.03044152 0.03048611 0.03019238 0.03031898 0.0302968 0.03035188] mean value: 0.02943410873413086 key: score_time value: [0.0190351 0.02024627 0.02113628 0.01070428 0.01898575 0.01937699 0.02058935 0.01084757 0.0107224 0.01985407] mean value: 0.017149806022644043 key: test_mcc value: [0.96824584 0.81325006 0.83914639 0.87831007 0.96824584 0.93548387 0.90369611 0.93743687 0.80516731 0.8688172 ] mean value: 0.8917799559713326 key: train_mcc value: [0.93900081 0.93890359 0.91007783 0.9352518 0.92088714 0.92808157 0.91007783 0.92805755 0.92820949 0.93182991] mean value: 0.9270377524969889 key: test_accuracy value: [0.98387097 0.90322581 0.91935484 0.93548387 0.98387097 0.96774194 0.9516129 0.96774194 0.90163934 0.93442623] mean value: 0.9448968799576943 key: train_accuracy value: [0.96942446 0.96942446 0.95503597 0.9676259 0.96043165 0.96402878 0.95503597 0.96402878 0.96409336 0.96588869] mean value: 0.9635018017901656 key: test_fscore value: [0.98412698 0.90909091 0.92063492 0.93939394 0.98412698 0.96774194 0.95081967 0.96666667 0.9 0.93333333] mean value: 0.9455935344988755 key: train_fscore value: [0.96969697 0.96958855 0.95495495 0.9676259 0.96057348 0.96415771 0.95495495 0.96402878 0.96389892 0.96613191] mean value: 0.9635612113921358 key: test_precision value: [0.96875 0.85714286 0.90625 0.88571429 0.96875 0.96774194 0.96666667 1. 0.93103448 0.93333333] mean value: 0.9385383561099634 key: train_precision value: [0.96113074 0.96441281 0.9566787 0.9676259 0.95714286 0.96071429 0.9566787 0.96402878 0.9673913 0.96099291] mean value: 0.9616796985424773 key: test_recall value: [1. 0.96774194 0.93548387 1. 1. 0.96774194 0.93548387 0.93548387 0.87096774 0.93333333] mean value: 0.9546236559139785 key: train_recall value: [0.97841727 0.97482014 0.95323741 0.9676259 0.96402878 0.9676259 0.95323741 0.96402878 0.96043165 0.97132616] mean value: 0.9654779402284623 key: test_roc_auc value: [0.98387097 0.90322581 0.91935484 0.93548387 0.98387097 0.96774194 0.9516129 0.96774194 0.90215054 0.9344086 ] mean value: 0.9449462365591399 key: train_roc_auc value: [0.96942446 0.96942446 0.95503597 0.9676259 0.96043165 0.96402878 0.95503597 0.96402878 0.9640868 0.96587891] mean value: 0.9635001676078492 key: test_jcc value: [0.96875 0.83333333 0.85294118 0.88571429 0.96875 0.9375 0.90625 0.93548387 0.81818182 0.875 ] mean value: 0.8981904484667768 key: train_jcc value: [0.94117647 0.94097222 0.9137931 0.93728223 0.92413793 0.93079585 0.9137931 0.93055556 0.93031359 0.93448276] mean value: 0.9297302811483933 MCC on Blind test: 0.23 Accuracy on Blind test: 0.47 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:143: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:146: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.20507646 0.19756031 0.19703841 0.19701099 0.20378065 0.19817948 0.19686937 0.19685602 0.19804025 0.20452499] mean value: 0.1994936943054199 key: score_time value: [0.01948881 0.02093601 0.01906753 0.02151203 0.02097845 0.01082182 0.02040362 0.01091933 0.02004528 0.01085353] mean value: 0.017502641677856444 key: test_mcc value: [0.96824584 0.84266484 0.90369611 0.90748521 0.96824584 0.96824584 0.96824584 0.93743687 0.87082935 0.83655914] mean value: 0.9171654872995563 key: train_mcc value: [0.94254361 0.94619622 0.94609826 0.94966486 0.94609826 0.94966486 0.93890359 0.95339163 0.95691189 0.9534734 ] mean value: 0.948294657254694 key: test_accuracy value: [0.98387097 0.91935484 0.9516129 0.9516129 0.98387097 0.98387097 0.98387097 0.96774194 0.93442623 0.91803279] mean value: 0.9578265468006346 key: train_accuracy value: [0.97122302 0.97302158 0.97302158 0.97482014 0.97302158 0.97482014 0.96942446 0.97661871 0.97845601 0.97666068] mean value: 0.9741087919610452 key: test_fscore value: [0.98412698 0.92307692 0.95238095 0.95384615 0.98412698 0.98360656 0.98412698 0.96666667 0.93333333 0.91803279] mean value: 0.9583324325947277 key: train_fscore value: [0.97142857 0.97326203 0.97316637 0.97491039 0.97316637 0.97491039 0.96958855 0.97682709 0.97841727 0.97690941] mean value: 0.9742586454574466 key: test_precision value: [0.96875 0.88235294 0.9375 0.91176471 0.96875 1. 0.96875 1. 0.96551724 0.90322581] mean value: 0.9506610694889747 key: train_precision value: [0.96453901 0.96466431 0.96797153 0.97142857 0.96797153 0.97142857 0.96441281 0.96819788 0.97841727 0.96830986] mean value: 0.9687341337990163 key: test_recall value: [1. 0.96774194 0.96774194 1. 1. 0.96774194 1. 0.93548387 0.90322581 0.93333333] mean value: 0.9675268817204301 key: train_recall value: [0.97841727 0.98201439 0.97841727 0.97841727 0.97841727 0.97841727 0.97482014 0.98561151 0.97841727 0.98566308] mean value: 0.9798612722725046 key: test_roc_auc value: [0.98387097 0.91935484 0.9516129 0.9516129 0.98387097 0.98387097 0.98387097 0.96774194 0.93494624 0.91827957] mean value: 0.9579032258064516 key: train_roc_auc value: [0.97122302 0.97302158 0.97302158 0.97482014 0.97302158 0.97482014 0.96942446 0.97661871 0.97845594 0.97664449] mean value: 0.9741071658801991 key: test_jcc value: [0.96875 0.85714286 0.90909091 0.91176471 0.96875 0.96774194 0.96875 0.93548387 0.875 0.84848485] mean value: 0.921095912705258 key: train_jcc value: [0.94444444 0.94791667 0.94773519 0.95104895 0.94773519 0.95104895 0.94097222 0.95470383 0.95774648 0.95486111] mean value: 0.9498213041443461 MCC on Blind test: 0.2 Accuracy on Blind test: 0.44 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.04743552 0.02349114 0.02685213 0.02750683 0.0252192 0.0279355 0.03556037 0.04037642 0.03911209 0.03533268] mean value: 0.032882189750671385 key: score_time value: [0.01078486 0.01099777 0.01306605 0.01077437 0.01067662 0.01066399 0.01071048 0.01073122 0.01087689 0.01084948] mean value: 0.011013174057006836 key: test_mcc value: [0.96824584 0.7130241 0.83914639 0.90748521 0.79471941 0.93548387 0.71004695 0.80813523 0.77096774 0.87082935] mean value: 0.8318084093587729 key: train_mcc value: [0.87424213 0.85278837 0.83904739 0.84537297 0.85265591 0.84192273 0.83904739 0.85646981 0.84627216 0.84586123] mean value: 0.8493680080976538 key: test_accuracy value: [0.98387097 0.85483871 0.91935484 0.9516129 0.88709677 0.96774194 0.85483871 0.90322581 0.8852459 0.93442623] mean value: 0.9142252776308831 key: train_accuracy value: [0.93705036 0.92625899 0.91906475 0.92266187 0.92625899 0.92086331 0.91906475 0.92805755 0.92280072 0.92280072] mean value: 0.9244882011805278 key: test_fscore value: [0.98412698 0.86153846 0.92063492 0.95384615 0.89855072 0.96774194 0.85714286 0.9 0.8852459 0.93548387] mean value: 0.9164311810018015 key: train_fscore value: [0.93761141 0.92717584 0.92091388 0.92307692 0.92691622 0.92170819 0.92091388 0.92907801 0.92416226 0.92389381] mean value: 0.9255450426062092 key: test_precision value: [0.96875 0.82352941 0.90625 0.91176471 0.81578947 0.96774194 0.84375 0.93103448 0.9 0.90625 ] mean value: 0.8974860009573761 key: train_precision value: [0.92932862 0.91578947 0.90034364 0.91814947 0.91872792 0.91197183 0.90034364 0.91608392 0.90657439 0.91258741] mean value: 0.9129900316323134 key: test_recall value: [1. 0.90322581 0.93548387 1. 1. 0.96774194 0.87096774 0.87096774 0.87096774 0.96666667] mean value: 0.9386021505376344 key: train_recall value: [0.94604317 0.93884892 0.94244604 0.92805755 0.9352518 0.93165468 0.94244604 0.94244604 0.94244604 0.93548387] mean value: 0.9385124158737526 key: test_roc_auc value: [0.98387097 0.85483871 0.91935484 0.9516129 0.88709677 0.96774194 0.85483871 0.90322581 0.88548387 0.93494624] mean value: 0.9143010752688172 key: train_roc_auc value: [0.93705036 0.92625899 0.91906475 0.92266187 0.92625899 0.92086331 0.91906475 0.92805755 0.92283592 0.92277791] mean value: 0.9244894407055001 key: test_jcc value: [0.96875 0.75675676 0.85294118 0.91176471 0.81578947 0.9375 0.75 0.81818182 0.79411765 0.87878788] mean value: 0.8484589456822429 key: train_jcc value: [0.88255034 0.86423841 0.8534202 0.85714286 0.86378738 0.85478548 0.8534202 0.86754967 0.85901639 0.85855263] mean value: 0.8614463542047712 MCC on Blind test: 0.21 Accuracy on Blind test: 0.53 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.72030234 0.71688986 0.82023787 0.6914432 0.76275826 0.84668159 0.7217288 0.70235157 0.860708 0.69467163] mean value: 0.7537773132324219 key: score_time value: [0.01084113 0.01207328 0.01228642 0.01954889 0.0122242 0.01225781 0.01232028 0.0122869 0.01229262 0.01234746] mean value: 0.012847900390625 key: test_mcc value: [0.93743687 0.90369611 1. 0.90369611 0.87096774 0.93548387 0.90369611 0.87278605 0.93649139 0.87082935] mean value: 0.9135083615653431 key: train_mcc value: [0.95329292 0.94634322 0.95685929 0.95685929 0.97482645 0.96043787 0.93195016 0.96048758 0.96050901 0.98205307] mean value: 0.9583618868215811 key: test_accuracy value: [0.96774194 0.9516129 1. 0.9516129 0.93548387 0.96774194 0.9516129 0.93548387 0.96721311 0.93442623] mean value: 0.9562929666842941 key: train_accuracy value: [0.97661871 0.97302158 0.97841727 0.97841727 0.98741007 0.98021583 0.96582734 0.98021583 0.98025135 0.99102334] mean value: 0.9791418570708963 key: test_fscore value: [0.96875 0.95238095 1. 0.95238095 0.93548387 0.96774194 0.95238095 0.93333333 0.96666667 0.93548387] mean value: 0.9564602534562212 key: train_fscore value: [0.97674419 0.97335702 0.97833935 0.97849462 0.98738739 0.98025135 0.96625222 0.980322 0.98025135 0.99102334] mean value: 0.9792422819398573 key: test_precision value: [0.93939394 0.9375 1. 0.9375 0.93548387 0.96774194 0.9375 0.96551724 1. 0.90625 ] mean value: 0.9526886987224863 key: train_precision value: [0.97153025 0.96140351 0.98188406 0.975 0.98916968 0.97849462 0.95438596 0.97508897 0.97849462 0.99280576] mean value: 0.975825742653484 key: test_recall value: [1. 0.96774194 1. 0.96774194 0.93548387 0.96774194 0.96774194 0.90322581 0.93548387 0.96666667] mean value: 0.9611827956989247 key: train_recall value: [0.98201439 0.98561151 0.97482014 0.98201439 0.98561151 0.98201439 0.97841727 0.98561151 0.98201439 0.98924731] mean value: 0.9827376808230834 key: test_roc_auc value: [0.96774194 0.9516129 1. 0.9516129 0.93548387 0.96774194 0.9516129 0.93548387 0.96774194 0.93494624] mean value: 0.9563978494623656 key: train_roc_auc value: [0.97661871 0.97302158 0.97841727 0.97841727 0.98741007 0.98021583 0.96582734 0.98021583 0.98025451 0.99102653] mean value: 0.9791424924576468 key: test_jcc value: [0.93939394 0.90909091 1. 0.90909091 0.87878788 0.9375 0.90909091 0.875 0.93548387 0.87878788] mean value: 0.9172226295210166 key: train_jcc value: [0.95454545 0.94809689 0.95759717 0.95789474 0.97508897 0.96126761 0.9347079 0.96140351 0.96126761 0.98220641] mean value: 0.959407624783067 MCC on Blind test: 0.14 Accuracy on Blind test: 0.35 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01091671 0.01010871 0.00867534 0.00845599 0.0082767 0.00828934 0.00830674 0.00770044 0.00745106 0.00766039] mean value: 0.008584141731262207 key: score_time value: [0.01369882 0.00904274 0.0089016 0.00860572 0.00860953 0.00857306 0.00860238 0.00795913 0.00795174 0.00807285] mean value: 0.009001755714416504 key: test_mcc value: [0.78446454 0.51856298 0.71004695 0.84266484 0.7190925 0.67883359 0.51639778 0.84266484 0.67204301 0.73763441] mean value: 0.7022405434817621 key: train_mcc value: [0.70405758 0.72340077 0.71605437 0.70505422 0.73033396 0.71230395 0.70505422 0.70180672 0.72391206 0.73070576] mean value: 0.7152683609552583 key: test_accuracy value: [0.88709677 0.75806452 0.85483871 0.91935484 0.85483871 0.83870968 0.75806452 0.91935484 0.83606557 0.86885246] mean value: 0.8495240613432047 key: train_accuracy value: [0.84532374 0.86151079 0.85791367 0.85251799 0.86510791 0.85611511 0.85251799 0.85071942 0.86175943 0.86535009] mean value: 0.8568836133965358 key: test_fscore value: [0.89552239 0.76923077 0.85714286 0.92307692 0.86567164 0.84375 0.75409836 0.91525424 0.83870968 0.86666667] mean value: 0.852912352133119 key: train_fscore value: [0.85901639 0.86371681 0.85968028 0.85304659 0.86631016 0.85714286 0.85304659 0.85309735 0.86371681 0.86535009] mean value: 0.8594123948387209 key: test_precision value: [0.83333333 0.73529412 0.84375 0.88235294 0.80555556 0.81818182 0.76666667 0.96428571 0.83870968 0.86666667] mean value: 0.8354796490932639 key: train_precision value: [0.78915663 0.85017422 0.84912281 0.85 0.85865724 0.85106383 0.85 0.83972125 0.85017422 0.86690647] mean value: 0.845497666835835 key: test_recall value: [0.96774194 0.80645161 0.87096774 0.96774194 0.93548387 0.87096774 0.74193548 0.87096774 0.83870968 0.86666667] mean value: 0.8737634408602151 key: train_recall value: [0.94244604 0.87769784 0.8705036 0.85611511 0.87410072 0.86330935 0.85611511 0.86690647 0.87769784 0.86379928] mean value: 0.8748691369485058 key: test_roc_auc value: [0.88709677 0.75806452 0.85483871 0.91935484 0.85483871 0.83870968 0.75806452 0.91935484 0.83602151 0.8688172 ] mean value: 0.8495161290322581 key: train_roc_auc value: [0.84532374 0.86151079 0.85791367 0.85251799 0.86510791 0.85611511 0.85251799 0.85071942 0.86178799 0.86535288] mean value: 0.8568867486655837 key: test_jcc value: [0.81081081 0.625 0.75 0.85714286 0.76315789 0.72972973 0.60526316 0.84375 0.72222222 0.76470588] mean value: 0.747178255489014 key: train_jcc value: [0.75287356 0.76012461 0.75389408 0.74375 0.76415094 0.75 0.74375 0.74382716 0.76012461 0.76265823] mean value: 0.7535153197137231 MCC on Blind test: 0.21 Accuracy on Blind test: 0.57 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00817347 0.00792408 0.00842571 0.00832367 0.00831985 0.00922894 0.00865865 0.00853586 0.00858712 0.00871468] mean value: 0.008489203453063966 key: score_time value: [0.0080893 0.00804472 0.00845718 0.00870252 0.00849962 0.00911498 0.00868034 0.00858021 0.00858331 0.00869846] mean value: 0.00854506492614746 key: test_mcc value: [0.61807005 0.65372045 0.45374261 0.71004695 0.51856298 0.71004695 0.42023032 0.74193548 0.54251915 0.57419355] mean value: 0.5943068479116385 key: train_mcc value: [0.61176415 0.63718965 0.62604511 0.60075441 0.62596408 0.60075441 0.65528703 0.62262853 0.64839945 0.64106733] mean value: 0.6269854139141487 key: test_accuracy value: [0.80645161 0.82258065 0.72580645 0.85483871 0.75806452 0.85483871 0.70967742 0.87096774 0.7704918 0.78688525] mean value: 0.7960602855631941 key: train_accuracy value: [0.8057554 0.81834532 0.81294964 0.80035971 0.81294964 0.80035971 0.82733813 0.81115108 0.82405745 0.82046679] mean value: 0.8133732870077367 key: test_fscore value: [0.79310345 0.8358209 0.71186441 0.85245902 0.76923077 0.85245902 0.71875 0.87096774 0.76666667 0.78688525] mean value: 0.7958207207099356 key: train_fscore value: [0.80851064 0.82186949 0.81090909 0.79927667 0.8115942 0.79927667 0.83098592 0.81415929 0.82624113 0.82269504] mean value: 0.814551814377158 key: test_precision value: [0.85185185 0.77777778 0.75 0.86666667 0.73529412 0.86666667 0.6969697 0.87096774 0.79310345 0.77419355] mean value: 0.7983491516178162 key: train_precision value: [0.7972028 0.80622837 0.81985294 0.80363636 0.81751825 0.80363636 0.8137931 0.80139373 0.81468531 0.81403509] mean value: 0.8091982321605484 key: test_recall value: [0.74193548 0.90322581 0.67741935 0.83870968 0.80645161 0.83870968 0.74193548 0.87096774 0.74193548 0.8 ] mean value: 0.7961290322580645 key: train_recall value: [0.82014388 0.8381295 0.80215827 0.79496403 0.8057554 0.79496403 0.84892086 0.82733813 0.8381295 0.83154122] mean value: 0.8202044815760295 key: test_roc_auc value: [0.80645161 0.82258065 0.72580645 0.85483871 0.75806452 0.85483871 0.70967742 0.87096774 0.77096774 0.78709677] mean value: 0.7961290322580645 key: train_roc_auc value: [0.8057554 0.81834532 0.81294964 0.80035971 0.81294964 0.80035971 0.82733813 0.81115108 0.82408267 0.82044687] mean value: 0.8133738170753719 key: test_jcc value: [0.65714286 0.71794872 0.55263158 0.74285714 0.625 0.74285714 0.56097561 0.77142857 0.62162162 0.64864865] mean value: 0.6641111891208169 key: train_jcc value: [0.67857143 0.69760479 0.68195719 0.66566265 0.68292683 0.66566265 0.71084337 0.68656716 0.70392749 0.69879518] mean value: 0.6872518746851146 MCC on Blind test: 0.18 Accuracy on Blind test: 0.52 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00818872 0.00765157 0.00800991 0.00798917 0.00798535 0.00793529 0.00807238 0.00833344 0.00816226 0.00822878] mean value: 0.008055686950683594 key: score_time value: [0.01362538 0.01161528 0.01154208 0.01181722 0.01149821 0.01183558 0.01181483 0.01180124 0.01576948 0.01188374] mean value: 0.012320303916931152 key: test_mcc value: [0.7130241 0.61418277 0.5483871 0.77459667 0.51856298 0.74348441 0.58834841 0.61807005 0.60818119 0.57576971] mean value: 0.6302607385125394 key: train_mcc value: [0.7014797 0.74464768 0.73388892 0.71949894 0.75180343 0.71341277 0.73033396 0.70918848 0.73474672 0.73420349] mean value: 0.7273204091578028 key: test_accuracy value: [0.85483871 0.80645161 0.77419355 0.88709677 0.75806452 0.87096774 0.79032258 0.80645161 0.80327869 0.78688525] mean value: 0.8138551031200423 key: train_accuracy value: [0.85071942 0.87230216 0.86690647 0.85971223 0.87589928 0.85611511 0.86510791 0.85431655 0.86714542 0.86535009] mean value: 0.8633574648360306 key: test_fscore value: [0.84745763 0.8125 0.77419355 0.8852459 0.76923077 0.86666667 0.80597015 0.79310345 0.8 0.77192982] mean value: 0.8126297935133517 key: train_fscore value: [0.84990958 0.8716094 0.86594203 0.85869565 0.87567568 0.85185185 0.86388385 0.85137615 0.86446886 0.85875706] mean value: 0.8612170116983378 key: test_precision value: [0.89285714 0.78787879 0.77419355 0.9 0.73529412 0.89655172 0.75 0.85185185 0.82758621 0.81481481] mean value: 0.8231028194471236 key: train_precision value: [0.85454545 0.87636364 0.87226277 0.8649635 0.87725632 0.8778626 0.87179487 0.86891386 0.88059701 0.9047619 ] mean value: 0.8749321930550784 key: test_recall value: [0.80645161 0.83870968 0.77419355 0.87096774 0.80645161 0.83870968 0.87096774 0.74193548 0.77419355 0.73333333] mean value: 0.8055913978494623 key: train_recall value: [0.84532374 0.86690647 0.85971223 0.85251799 0.87410072 0.82733813 0.85611511 0.83453237 0.84892086 0.8172043 ] mean value: 0.848267192697455 key: test_roc_auc value: [0.85483871 0.80645161 0.77419355 0.88709677 0.75806452 0.87096774 0.79032258 0.80645161 0.80376344 0.78602151] mean value: 0.8138172043010753 key: train_roc_auc value: [0.85071942 0.87230216 0.86690647 0.85971223 0.87589928 0.85611511 0.86510791 0.85431655 0.86711276 0.86543668] mean value: 0.8633628581006163 key: test_jcc value: [0.73529412 0.68421053 0.63157895 0.79411765 0.625 0.76470588 0.675 0.65714286 0.66666667 0.62857143] mean value: 0.6862288073123987 key: train_jcc value: [0.73899371 0.7724359 0.76357827 0.75238095 0.77884615 0.74193548 0.76038339 0.74121406 0.76129032 0.75247525] mean value: 0.7563533487181033 MCC on Blind test: 0.16 Accuracy on Blind test: 0.57 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01835275 0.01664305 0.01745725 0.01704788 0.01895404 0.01934195 0.01825833 0.01938081 0.01911664 0.01921248] mean value: 0.0183765172958374 key: score_time value: [0.00940108 0.01006126 0.00929928 0.00980639 0.01032066 0.01062155 0.01041937 0.01055121 0.01047111 0.01046944] mean value: 0.010142135620117187 key: test_mcc value: [0.96824584 0.66226618 0.62471615 0.7190925 0.7284928 0.80813523 0.50083542 0.80645161 0.63939757 0.81978229] mean value: 0.7277415590359753 key: train_mcc value: [0.85345163 0.77632088 0.79541168 0.777078 0.76906554 0.75930753 0.79995316 0.76580581 0.77932355 0.78519796] mean value: 0.78609157351081 key: test_accuracy value: [0.98387097 0.82258065 0.80645161 0.85483871 0.85483871 0.90322581 0.74193548 0.90322581 0.81967213 0.90163934] mean value: 0.859227921734532 key: train_accuracy value: [0.92625899 0.88489209 0.89568345 0.88489209 0.88129496 0.87589928 0.89748201 0.8794964 0.88689408 0.89048474] mean value: 0.890327809565633 key: test_fscore value: [0.98412698 0.84057971 0.82352941 0.86567164 0.86956522 0.90625 0.77142857 0.90322581 0.82539683 0.90909091] mean value: 0.8698865077586886 key: train_fscore value: [0.92794376 0.89189189 0.90068493 0.89225589 0.88851351 0.88403361 0.90289608 0.88701518 0.89303905 0.89608177] mean value: 0.8964355683391803 key: test_precision value: [0.96875 0.76315789 0.75675676 0.80555556 0.78947368 0.87878788 0.69230769 0.90322581 0.8125 0.83333333] mean value: 0.8203848602140198 key: train_precision value: [0.90721649 0.84076433 0.85947712 0.83860759 0.83757962 0.829653 0.85760518 0.83492063 0.84565916 0.8538961 ] mean value: 0.8505379240652493 key: test_recall value: [1. 0.93548387 0.90322581 0.93548387 0.96774194 0.93548387 0.87096774 0.90322581 0.83870968 1. ] mean value: 0.9290322580645161 key: train_recall value: [0.94964029 0.94964029 0.94604317 0.95323741 0.94604317 0.94604317 0.95323741 0.94604317 0.94604317 0.94265233] mean value: 0.9478623552770686 key: test_roc_auc value: [0.98387097 0.82258065 0.80645161 0.85483871 0.85483871 0.90322581 0.74193548 0.90322581 0.81935484 0.90322581] mean value: 0.8593548387096774 key: train_roc_auc value: [0.92625899 0.88489209 0.89568345 0.88489209 0.88129496 0.87589928 0.89748201 0.8794964 0.88700008 0.89039091] mean value: 0.8903290271008999 key: test_jcc value: [0.96875 0.725 0.7 0.76315789 0.76923077 0.82857143 0.62790698 0.82352941 0.7027027 0.83333333] mean value: 0.7742182517083968 key: train_jcc value: [0.86557377 0.80487805 0.81931464 0.80547112 0.7993921 0.79216867 0.82298137 0.7969697 0.80674847 0.8117284 ] mean value: 0.8125226282348854 MCC on Blind test: 0.26 Accuracy on Blind test: 0.5 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.65558004 1.55461335 1.68518972 1.59982467 1.59840369 1.78556776 1.98289418 1.71361303 1.71010733 1.56034899] mean value: 1.6846142768859864 key: score_time value: [0.01405716 0.02408385 0.01391459 0.01108027 0.01359916 0.01913881 0.01201797 0.01144147 0.01147699 0.01196384] mean value: 0.014277410507202149 key: test_mcc value: [1. 0.90369611 0.93548387 0.96824584 0.93743687 0.90369611 0.93548387 0.93743687 0.87082935 0.90215054] mean value: 0.9294459430210258 key: train_mcc value: [0.99283145 0.98561151 0.99283145 0.98921503 0.99283145 0.98921503 0.98202074 0.99640932 0.99284416 0.99641577] mean value: 0.9910225917811445 key: test_accuracy value: [1. 0.9516129 0.96774194 0.98387097 0.96774194 0.9516129 0.96774194 0.96774194 0.93442623 0.95081967] mean value: 0.9643310417768377 key: train_accuracy value: [0.99640288 0.99280576 0.99640288 0.99460432 0.99640288 0.99460432 0.99100719 0.99820144 0.99640934 0.99820467] mean value: 0.9955045658266923 key: test_fscore value: [1. 0.95238095 0.96774194 0.98412698 0.96875 0.95081967 0.96774194 0.96666667 0.93333333 0.95081967] mean value: 0.9642381151737973 key: train_fscore value: [0.99638989 0.99280576 0.99638989 0.99459459 0.99638989 0.99459459 0.99102334 0.9981982 0.99638989 0.99820467] mean value: 0.9954980716751404 key: test_precision value: [1. 0.9375 0.96774194 0.96875 0.93939394 0.96666667 0.96774194 1. 0.96551724 0.93548387] mean value: 0.9648795589375401 key: train_precision value: [1. 0.99280576 1. 0.99638989 1. 0.99638989 0.98924731 1. 1. 1. ] mean value: 0.9974832850617142 key: test_recall value: [1. 0.96774194 0.96774194 1. 1. 0.93548387 0.96774194 0.93548387 0.90322581 0.96666667] mean value: 0.9644086021505376 key: train_recall value: [0.99280576 0.99280576 0.99280576 0.99280576 0.99280576 0.99280576 0.99280576 0.99640288 0.99280576 0.99641577] mean value: 0.9935264691472628 key: test_roc_auc value: [1. 0.9516129 0.96774194 0.98387097 0.96774194 0.9516129 0.96774194 0.96774194 0.93494624 0.95107527] mean value: 0.9644086021505377 key: train_roc_auc value: [0.99640288 0.99280576 0.99640288 0.99460432 0.99640288 0.99460432 0.99100719 0.99820144 0.99640288 0.99820789] mean value: 0.995504241767876 key: test_jcc value: [1. 0.90909091 0.9375 0.96875 0.93939394 0.90625 0.9375 0.93548387 0.875 0.90625 ] mean value: 0.931521871945259 key: train_jcc value: [0.99280576 0.98571429 0.99280576 0.98924731 0.99280576 0.98924731 0.98220641 0.99640288 0.99280576 0.99641577] mean value: 0.9910456984954045 MCC on Blind test: 0.15 Accuracy on Blind test: 0.35 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01399827 0.01331663 0.01154613 0.01056981 0.01088071 0.00967002 0.00976062 0.00975561 0.0097971 0.00941205] mean value: 0.010870695114135742 key: score_time value: [0.01116037 0.00968766 0.00949836 0.00883269 0.00868964 0.00786233 0.00786996 0.00779343 0.0077889 0.0078299 ] mean value: 0.008701324462890625 key: test_mcc value: [1. 0.87096774 1. 0.96824584 0.90369611 0.87831007 0.87831007 0.96824584 0.96774194 0.90215054] mean value: 0.9337668133579895 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.93548387 1. 0.98387097 0.9516129 0.93548387 0.93548387 0.98387097 0.98360656 0.95081967] mean value: 0.9660232681121099 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.93548387 1. 0.98412698 0.95238095 0.93103448 0.93103448 0.98360656 0.98360656 0.95081967] mean value: 0.9652093559878165 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.93548387 1. 0.96875 0.9375 1. 1. 1. 1. 0.93548387] mean value: 0.9777217741935483 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.93548387 1. 1. 0.96774194 0.87096774 0.87096774 0.96774194 0.96774194 0.96666667] mean value: 0.9547311827956989 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.93548387 1. 0.98387097 0.9516129 0.93548387 0.93548387 0.98387097 0.98387097 0.95107527] mean value: 0.9660752688172043 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.87878788 1. 0.96875 0.90909091 0.87096774 0.87096774 0.96774194 0.96774194 0.90625 ] mean value: 0.9340298142717498 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.01 Accuracy on Blind test: 0.2 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.10142112 0.10165668 0.10126591 0.10151219 0.11299324 0.11322856 0.108289 0.1019156 0.10539865 0.1026175 ] mean value: 0.10502984523773193 key: score_time value: [0.0171802 0.0173862 0.01719642 0.01748347 0.01896811 0.01896906 0.01711893 0.01859283 0.01831841 0.01735854] mean value: 0.01785721778869629 key: test_mcc value: [1. 0.90369611 0.93548387 0.93548387 0.93743687 0.93548387 0.93743687 0.96824584 0.96770777 0.90215054] mean value: 0.9423125607021228 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.9516129 0.96774194 0.96774194 0.96774194 0.96774194 0.96774194 0.98387097 0.98360656 0.95081967] mean value: 0.9708619777895293 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.95238095 0.96774194 0.96774194 0.96875 0.96774194 0.96666667 0.98360656 0.98412698 0.95081967] mean value: 0.9709576639134413 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.9375 0.96774194 0.96774194 0.93939394 0.96774194 1. 1. 0.96875 0.93548387] mean value: 0.9684353616813295 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96774194 0.96774194 0.96774194 1. 0.96774194 0.93548387 0.96774194 1. 0.96666667] mean value: 0.9740860215053764 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.9516129 0.96774194 0.96774194 0.96774194 0.96774194 0.96774194 0.98387097 0.98333333 0.95107527] mean value: 0.9708602150537635 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.90909091 0.9375 0.9375 0.93939394 0.9375 0.93548387 0.96774194 0.96875 0.90625 ] mean value: 0.9439210654936462 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.2 Accuracy on Blind test: 0.36 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00819397 0.00782299 0.00878263 0.00824928 0.007725 0.00857353 0.00780129 0.00821066 0.008883 0.00848746] mean value: 0.008272981643676758 key: score_time value: [0.00791669 0.00839043 0.00856495 0.00861716 0.00864053 0.00859261 0.00863576 0.00865197 0.00859213 0.00803781] mean value: 0.00846400260925293 key: test_mcc value: [0.81325006 0.82199494 0.83914639 0.90369611 0.87096774 0.90369611 0.81325006 0.7284928 0.74460444 0.80475071] mean value: 0.8243849367718851 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.90322581 0.90322581 0.91935484 0.9516129 0.93548387 0.9516129 0.90322581 0.85483871 0.86885246 0.90163934] mean value: 0.9093072448439978 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.89655172 0.89285714 0.91803279 0.95238095 0.93548387 0.95081967 0.89655172 0.83636364 0.86206897 0.89655172] mean value: 0.9037662199516902 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96296296 1. 0.93333333 0.9375 0.93548387 0.96666667 0.96296296 0.95833333 0.92592593 0.92857143] mean value: 0.9511740484724356 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.83870968 0.80645161 0.90322581 0.96774194 0.93548387 0.93548387 0.83870968 0.74193548 0.80645161 0.86666667] mean value: 0.8640860215053763 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90322581 0.90322581 0.91935484 0.9516129 0.93548387 0.9516129 0.90322581 0.85483871 0.86989247 0.90107527] mean value: 0.9093548387096775 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.8125 0.80645161 0.84848485 0.90909091 0.87878788 0.90625 0.8125 0.71875 0.75757576 0.8125 ] mean value: 0.826289100684262 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.1 Accuracy on Blind test: 0.26 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.33802533 1.34034443 1.34689403 1.33327985 1.32815957 1.34741807 1.36258483 1.35150194 1.35220432 1.37553906] mean value: 1.3475951433181763 key: score_time value: [0.09532094 0.15330195 0.09112287 0.0915432 0.09900188 0.09554839 0.09749842 0.0989244 0.09722352 0.09352469] mean value: 0.10130102634429931 key: test_mcc value: [1. 0.90369611 0.96824584 0.96824584 0.93743687 0.96824584 1. 0.96824584 0.96770777 0.8688172 ] mean value: 0.9550641303879139 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.9516129 0.98387097 0.98387097 0.96774194 0.98387097 1. 0.98387097 0.98360656 0.93442623] mean value: 0.9772871496562665 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.95238095 0.98412698 0.98412698 0.96875 0.98360656 1. 0.98360656 0.98412698 0.93333333] mean value: 0.9774058352849336 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.9375 0.96875 0.96875 0.93939394 1. 1. 1. 0.96875 0.93333333] mean value: 0.9716477272727273 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96774194 1. 1. 1. 0.96774194 1. 0.96774194 1. 0.93333333] mean value: 0.9836559139784946 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.9516129 0.98387097 0.98387097 0.96774194 0.98387097 1. 0.98387097 0.98333333 0.9344086 ] mean value: 0.9772580645161291 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.90909091 0.96875 0.96875 0.93939394 0.96774194 1. 0.96774194 0.96875 0.875 ] mean value: 0.9565218719452591 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.19 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.89337206 0.89832807 0.98853326 0.97143817 0.95260191 0.94139814 0.88369918 0.9011116 0.89748955 0.93211508] mean value: 0.9260087013244629 key: score_time value: [0.21353126 0.18774438 0.24319863 0.28244352 0.24419403 0.22724342 0.2297473 0.25225329 0.2684927 0.27329707] mean value: 0.24221456050872803 key: test_mcc value: [0.96824584 0.84266484 0.96824584 0.93743687 0.93743687 0.96824584 1. 0.93548387 0.96770777 0.8688172 ] mean value: 0.9394284931358869 key: train_mcc value: [0.96073627 0.95025527 0.97124816 0.96058703 0.96768225 0.95693359 0.96412858 0.96778244 0.95713569 0.97137405] mean value: 0.9627863336198357 key: test_accuracy value: [0.98387097 0.91935484 0.98387097 0.96774194 0.96774194 0.98387097 1. 0.96774194 0.98360656 0.93442623] mean value: 0.9692226335272343 key: train_accuracy value: [0.98021583 0.97482014 0.98561151 0.98021583 0.98381295 0.97841727 0.98201439 0.98381295 0.97845601 0.98563734] mean value: 0.9813014220580447 key: test_fscore value: [0.98412698 0.92307692 0.98412698 0.96875 0.96875 0.98360656 1. 0.96774194 0.98412698 0.93333333] mean value: 0.9697639701652129 key: train_fscore value: [0.98046181 0.97526502 0.98566308 0.98039216 0.98389982 0.97857143 0.98214286 0.98395722 0.97864769 0.98576512] mean value: 0.9814766206153425 key: test_precision value: [0.96875 0.88235294 0.96875 0.93939394 0.93939394 1. 1. 0.96774194 0.96875 0.93333333] mean value: 0.9568466088781554 key: train_precision value: [0.96842105 0.95833333 0.98214286 0.97173145 0.97864769 0.97163121 0.9751773 0.97526502 0.96830986 0.97879859] mean value: 0.972845835273727 key: test_recall value: [1. 0.96774194 1. 1. 1. 0.96774194 1. 0.96774194 1. 0.93333333] mean value: 0.9836559139784946 key: train_recall value: [0.99280576 0.99280576 0.98920863 0.98920863 0.98920863 0.98561151 0.98920863 0.99280576 0.98920863 0.99283154] mean value: 0.9902903483664681 key: test_roc_auc value: [0.98387097 0.91935484 0.98387097 0.96774194 0.96774194 0.98387097 1. 0.96774194 0.98333333 0.9344086 ] mean value: 0.9691935483870968 key: train_roc_auc value: [0.98021583 0.97482014 0.98561151 0.98021583 0.98381295 0.97841727 0.98201439 0.98381295 0.97847528 0.9856244 ] mean value: 0.9813020551300895 key: test_jcc value: [0.96875 0.85714286 0.96875 0.93939394 0.93939394 0.96774194 1. 0.9375 0.96875 0.875 ] mean value: 0.9422422671414608 key: train_jcc value: [0.96167247 0.95172414 0.97173145 0.96153846 0.96830986 0.95804196 0.96491228 0.96842105 0.95818815 0.97192982] mean value: 0.9636469650502072 MCC on Blind test: 0.09 Accuracy on Blind test: 0.2 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.02149725 0.00856233 0.00855422 0.00861883 0.00868273 0.00835061 0.00852489 0.00826049 0.00861621 0.008636 ] mean value: 0.009830355644226074 key: score_time value: [0.0092957 0.00866055 0.00866151 0.00840211 0.00860667 0.00837517 0.00868773 0.00860476 0.00860882 0.00852346] mean value: 0.00864264965057373 key: test_mcc value: [0.61807005 0.65372045 0.45374261 0.71004695 0.51856298 0.71004695 0.42023032 0.74193548 0.54251915 0.57419355] mean value: 0.5943068479116385 key: train_mcc value: [0.61176415 0.63718965 0.62604511 0.60075441 0.62596408 0.60075441 0.65528703 0.62262853 0.64839945 0.64106733] mean value: 0.6269854139141487 key: test_accuracy value: [0.80645161 0.82258065 0.72580645 0.85483871 0.75806452 0.85483871 0.70967742 0.87096774 0.7704918 0.78688525] mean value: 0.7960602855631941 key: train_accuracy value: [0.8057554 0.81834532 0.81294964 0.80035971 0.81294964 0.80035971 0.82733813 0.81115108 0.82405745 0.82046679] mean value: 0.8133732870077367 key: test_fscore value: [0.79310345 0.8358209 0.71186441 0.85245902 0.76923077 0.85245902 0.71875 0.87096774 0.76666667 0.78688525] mean value: 0.7958207207099356 key: train_fscore value: [0.80851064 0.82186949 0.81090909 0.79927667 0.8115942 0.79927667 0.83098592 0.81415929 0.82624113 0.82269504] mean value: 0.814551814377158 key: test_precision value: [0.85185185 0.77777778 0.75 0.86666667 0.73529412 0.86666667 0.6969697 0.87096774 0.79310345 0.77419355] mean value: 0.7983491516178162 key: train_precision value: [0.7972028 0.80622837 0.81985294 0.80363636 0.81751825 0.80363636 0.8137931 0.80139373 0.81468531 0.81403509] mean value: 0.8091982321605484 key: test_recall value: [0.74193548 0.90322581 0.67741935 0.83870968 0.80645161 0.83870968 0.74193548 0.87096774 0.74193548 0.8 ] mean value: 0.7961290322580645 key: train_recall value: [0.82014388 0.8381295 0.80215827 0.79496403 0.8057554 0.79496403 0.84892086 0.82733813 0.8381295 0.83154122] mean value: 0.8202044815760295 key: test_roc_auc value: [0.80645161 0.82258065 0.72580645 0.85483871 0.75806452 0.85483871 0.70967742 0.87096774 0.77096774 0.78709677] mean value: 0.7961290322580645 key: train_roc_auc value: [0.8057554 0.81834532 0.81294964 0.80035971 0.81294964 0.80035971 0.82733813 0.81115108 0.82408267 0.82044687] mean value: 0.8133738170753719 key: test_jcc value: [0.65714286 0.71794872 0.55263158 0.74285714 0.625 0.74285714 0.56097561 0.77142857 0.62162162 0.64864865] mean value: 0.6641111891208169 key: train_jcc value: [0.67857143 0.69760479 0.68195719 0.66566265 0.68292683 0.66566265 0.71084337 0.68656716 0.70392749 0.69879518] mean value: 0.6872518746851146 MCC on Blind test: 0.18 Accuracy on Blind test: 0.52 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.09717035 0.05194712 0.05566955 0.05653906 0.06008887 0.0614419 0.06166148 0.06023955 0.06417036 0.05456114] mean value: 0.06234893798828125 key: score_time value: [0.01015568 0.00965595 0.00964165 0.00960851 0.00993562 0.00997877 0.01027107 0.00972724 0.00962043 0.00961185] mean value: 0.009820675849914551 key: test_mcc value: [1. 0.90369611 0.93548387 0.96824584 0.93743687 0.93743687 1. 0.96824584 0.90586325 0.8688172 ] mean value: 0.942522584980111 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.9516129 0.96774194 0.98387097 0.96774194 0.96774194 1. 0.98387097 0.95081967 0.93442623] mean value: 0.9707826546800635 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.95238095 0.96774194 0.98412698 0.96875 0.96666667 1. 0.98360656 0.95384615 0.93333333] mean value: 0.971045258321501 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.9375 0.96774194 0.96875 0.93939394 1. 1. 1. 0.91176471 0.93333333] mean value: 0.9658483914093496 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.96774194 0.96774194 1. 1. 0.93548387 1. 0.96774194 1. 0.93333333] mean value: 0.9772043010752688 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.9516129 0.96774194 0.98387097 0.96774194 0.96774194 1. 0.98387097 0.95 0.9344086 ] mean value: 0.9706989247311828 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.90909091 0.9375 0.96875 0.93939394 0.93548387 1. 0.96774194 0.91176471 0.875 ] mean value: 0.9444725360818814 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.06 Accuracy on Blind test: 0.2 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.01567197 0.04181266 0.04561234 0.0498898 0.04169583 0.04362798 0.04267979 0.04328203 0.0497613 0.0470922 ] mean value: 0.04211258888244629 key: score_time value: [0.01037931 0.02170181 0.01888394 0.01488948 0.01078224 0.02180099 0.02077007 0.0193584 0.0108037 0.01081157] mean value: 0.016018152236938477 key: test_mcc value: [0.93548387 0.84266484 0.93548387 0.93743687 0.87278605 1. 0.96824584 0.87278605 0.9344086 0.8688172 ] mean value: 0.9168113188472994 key: train_mcc value: [0.93914669 0.94653932 0.93563929 0.93914669 0.94266562 0.93195016 0.93238486 0.94283651 0.93575728 0.9427658 ] mean value: 0.9388832217176918 key: test_accuracy value: [0.96774194 0.91935484 0.96774194 0.96774194 0.93548387 1. 0.98387097 0.93548387 0.96721311 0.93442623] mean value: 0.9579058699101005 key: train_accuracy value: [0.96942446 0.97302158 0.9676259 0.96942446 0.97122302 0.96582734 0.96582734 0.97122302 0.96768402 0.97127469] mean value: 0.969255582966302 key: test_fscore value: [0.96774194 0.92307692 0.96774194 0.96875 0.9375 1. 0.98412698 0.93333333 0.96774194 0.93333333] mean value: 0.9583346380322186 key: train_fscore value: [0.96980462 0.97345133 0.96808511 0.96980462 0.97153025 0.96625222 0.9664903 0.97163121 0.96808511 0.97163121] mean value: 0.9696765956964183 key: test_precision value: [0.96774194 0.88235294 0.96774194 0.93939394 0.90909091 1. 0.96875 0.96551724 0.96774194 0.93333333] mean value: 0.9501664170825576 key: train_precision value: [0.95789474 0.95818815 0.95454545 0.95789474 0.96126761 0.95438596 0.94809689 0.95804196 0.95454545 0.96140351] mean value: 0.9566264459258345 key: test_recall value: [0.96774194 0.96774194 0.96774194 1. 0.96774194 1. 1. 0.90322581 0.96774194 0.93333333] mean value: 0.9675268817204301 key: train_recall value: [0.98201439 0.98920863 0.98201439 0.98201439 0.98201439 0.97841727 0.98561151 0.98561151 0.98201439 0.98207885] mean value: 0.9830999716355947 key: test_roc_auc value: [0.96774194 0.91935484 0.96774194 0.96774194 0.93548387 1. 0.98387097 0.93548387 0.9672043 0.9344086 ] mean value: 0.9579032258064516 key: train_roc_auc value: [0.96942446 0.97302158 0.9676259 0.96942446 0.97122302 0.96582734 0.96582734 0.97122302 0.9677097 0.97125525] mean value: 0.9692562079368764 key: test_jcc value: [0.9375 0.85714286 0.9375 0.93939394 0.88235294 1. 0.96875 0.875 0.9375 0.875 ] mean value: 0.9210139737713268 key: train_jcc value: [0.94137931 0.94827586 0.93814433 0.94137931 0.94463668 0.9347079 0.93515358 0.94482759 0.93814433 0.94482759] mean value: 0.9411476480564737 MCC on Blind test: 0.14 Accuracy on Blind test: 0.35 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02284217 0.00781918 0.00835109 0.00833607 0.00749111 0.00754762 0.00822759 0.00809884 0.00830936 0.00828028] mean value: 0.009530329704284668 key: score_time value: [0.00877237 0.00818181 0.00863433 0.00789332 0.00792742 0.00775814 0.00851727 0.00839686 0.00838804 0.00852823] mean value: 0.008299779891967774 key: test_mcc value: [0.74193548 0.55301004 0.55895656 0.69047575 0.60677988 0.80813523 0.46358632 0.77459667 0.57576971 0.75310667] mean value: 0.6526352311777236 key: train_mcc value: [0.67282515 0.67609995 0.67144111 0.65172831 0.66087942 0.64772254 0.68595876 0.65901019 0.68263871 0.65745214] mean value: 0.6665756264215859 key: test_accuracy value: [0.87096774 0.77419355 0.77419355 0.83870968 0.79032258 0.90322581 0.72580645 0.88709677 0.78688525 0.86885246] mean value: 0.8220253833950291 key: train_accuracy value: [0.83273381 0.83453237 0.83273381 0.82194245 0.82733813 0.82014388 0.83992806 0.82553957 0.83842011 0.82585278] mean value: 0.8299164976815675 key: test_fscore value: [0.87096774 0.78787879 0.79411765 0.85294118 0.81690141 0.90625 0.75362319 0.88888889 0.8 0.87878788] mean value: 0.8350356717876952 key: train_fscore value: [0.84422111 0.84563758 0.84317032 0.83472454 0.83838384 0.83277592 0.84991568 0.83806344 0.84797297 0.83697479] mean value: 0.8411840193764768 key: test_precision value: [0.87096774 0.74285714 0.72972973 0.78378378 0.725 0.87878788 0.68421053 0.875 0.76470588 0.80555556] mean value: 0.7860598241318305 key: train_precision value: [0.78996865 0.79245283 0.79365079 0.7788162 0.78797468 0.778125 0.8 0.78193146 0.79936306 0.78797468] mean value: 0.789025736384194 key: test_recall value: [0.87096774 0.83870968 0.87096774 0.93548387 0.93548387 0.93548387 0.83870968 0.90322581 0.83870968 0.96666667] mean value: 0.8934408602150538 key: train_recall value: [0.90647482 0.90647482 0.89928058 0.89928058 0.89568345 0.89568345 0.90647482 0.9028777 0.9028777 0.89247312] mean value: 0.9007581031948635 key: test_roc_auc value: [0.87096774 0.77419355 0.77419355 0.83870968 0.79032258 0.90322581 0.72580645 0.88709677 0.78602151 0.87043011] mean value: 0.8220967741935484 key: train_roc_auc value: [0.83273381 0.83453237 0.83273381 0.82194245 0.82733813 0.82014388 0.83992806 0.82553957 0.83853562 0.82573296] mean value: 0.829916067146283 key: test_jcc value: [0.77142857 0.65 0.65853659 0.74358974 0.69047619 0.82857143 0.60465116 0.8 0.66666667 0.78378378] mean value: 0.7197704132672936 key: train_jcc value: [0.73043478 0.73255814 0.72886297 0.71633238 0.72173913 0.71346705 0.73900293 0.72126437 0.73607038 0.71965318] mean value: 0.7259385314063227 MCC on Blind test: 0.21 Accuracy on Blind test: 0.5 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01066303 0.01422668 0.01254988 0.01286054 0.01497436 0.01457095 0.01441455 0.01421928 0.01397729 0.01655555] mean value: 0.013901209831237793 key: score_time value: [0.00798249 0.01007485 0.01002264 0.01035452 0.01045942 0.01085925 0.01050234 0.01053119 0.01049948 0.01047754] mean value: 0.010176372528076173 key: test_mcc value: [0.84983659 0.90369611 0.96824584 0.93743687 0.83914639 1. 0.93743687 0.84266484 0.9344086 0.90215054] mean value: 0.9115022641468933 key: train_mcc value: [0.85210391 0.96048758 0.90882979 0.95324358 0.8782527 0.93534863 0.935276 0.91827075 0.93969601 0.97130001] mean value: 0.9252808948765114 key: test_accuracy value: [0.91935484 0.9516129 0.98387097 0.96774194 0.91935484 1. 0.96774194 0.91935484 0.96721311 0.95081967] mean value: 0.9547065044949762 key: train_accuracy value: [0.92266187 0.98021583 0.95323741 0.97661871 0.93705036 0.9676259 0.9676259 0.95863309 0.96947935 0.98563734] mean value: 0.9618785761337071 key: test_fscore value: [0.9122807 0.95238095 0.98412698 0.96875 0.91803279 1. 0.96666667 0.91525424 0.96774194 0.95081967] mean value: 0.9536053936717389 key: train_fscore value: [0.91746641 0.980322 0.95486111 0.97666068 0.93383743 0.96785714 0.96750903 0.95764273 0.97001764 0.98561151] mean value: 0.961178567797733 key: test_precision value: [1. 0.9375 0.96875 0.93939394 0.93333333 1. 1. 0.96428571 0.96774194 0.93548387] mean value: 0.96464887934646 key: train_precision value: [0.98353909 0.97508897 0.92281879 0.97491039 0.98406375 0.96099291 0.97101449 0.98113208 0.95155709 0.98916968] mean value: 0.9694287238395796 key: test_recall value: [0.83870968 0.96774194 1. 1. 0.90322581 1. 0.93548387 0.87096774 0.96774194 0.96666667] mean value: 0.9450537634408602 key: train_recall value: [0.85971223 0.98561151 0.98920863 0.97841727 0.88848921 0.97482014 0.96402878 0.9352518 0.98920863 0.98207885] mean value: 0.9546827054485444 key: test_roc_auc value: [0.91935484 0.9516129 0.98387097 0.96774194 0.91935484 1. 0.96774194 0.91935484 0.9672043 0.95107527] mean value: 0.954731182795699 key: train_roc_auc value: [0.92266187 0.98021583 0.95323741 0.97661871 0.93705036 0.9676259 0.9676259 0.95863309 0.96951471 0.98564374] mean value: 0.9618827518630257 key: test_jcc value: [0.83870968 0.90909091 0.96875 0.93939394 0.84848485 1. 0.93548387 0.84375 0.9375 0.90625 ] mean value: 0.9127413245356794 key: train_jcc value: [0.84751773 0.96140351 0.91362126 0.95438596 0.87588652 0.93771626 0.93706294 0.91872792 0.94178082 0.97163121] mean value: 0.925973413428646 MCC on Blind test: 0.13 Accuracy on Blind test: 0.39 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01376939 0.01323795 0.01437616 0.01475048 0.01206875 0.01291966 0.01519465 0.0118556 0.01220989 0.01324606] mean value: 0.013362860679626465 key: score_time value: [0.01042581 0.01045299 0.01043582 0.01047158 0.01042938 0.01043868 0.0104847 0.01037741 0.0103879 0.01042461] mean value: 0.010432887077331542 key: test_mcc value: [0.90748521 0.87278605 0.93548387 0.87096774 0.90369611 0.90369611 0.84983659 0.84983659 0.84710837 0.83638369] mean value: 0.8777280337009071 key: train_mcc value: [0.91267965 0.91482985 0.90302377 0.91827075 0.91106862 0.91267965 0.89008997 0.90161686 0.7528037 0.94982722] mean value: 0.8966890034959883 key: test_accuracy value: [0.9516129 0.93548387 0.96774194 0.93548387 0.9516129 0.9516129 0.91935484 0.91935484 0.91803279 0.91803279] mean value: 0.9368323638286621 key: train_accuracy value: [0.95503597 0.95683453 0.95143885 0.95863309 0.95503597 0.95503597 0.94244604 0.94964029 0.86355476 0.97486535] mean value: 0.9462520827144388 key: test_fscore value: [0.95384615 0.9375 0.96774194 0.93548387 0.95238095 0.95238095 0.9122807 0.9122807 0.92537313 0.91525424] mean value: 0.9364522640184937 key: train_fscore value: [0.95667244 0.95789474 0.95099819 0.95764273 0.95395948 0.95667244 0.9391635 0.94776119 0.87898089 0.97508897] mean value: 0.9474834571073163 key: test_precision value: [0.91176471 0.90909091 0.96774194 0.93548387 0.9375 0.9375 1. 1. 0.86111111 0.93103448] mean value: 0.9391227015294606 key: train_precision value: [0.92307692 0.93493151 0.95970696 0.98113208 0.97735849 0.92307692 0.99596774 0.98449612 0.78857143 0.96819788] mean value: 0.9436516053144435 key: test_recall value: [1. 0.96774194 0.96774194 0.93548387 0.96774194 0.96774194 0.83870968 0.83870968 1. 0.9 ] mean value: 0.9383870967741935 key: train_recall value: [0.99280576 0.98201439 0.94244604 0.9352518 0.93165468 0.99280576 0.88848921 0.91366906 0.99280576 0.98207885] mean value: 0.9554021299089761 key: test_roc_auc value: [0.9516129 0.93548387 0.96774194 0.93548387 0.9516129 0.9516129 0.91935484 0.91935484 0.91666667 0.91774194] mean value: 0.9366666666666668 key: train_roc_auc value: [0.95503597 0.95683453 0.95143885 0.95863309 0.95503597 0.95503597 0.94244604 0.94964029 0.86378639 0.97485238] mean value: 0.946273948583069 key: test_jcc value: [0.91176471 0.88235294 0.9375 0.87878788 0.90909091 0.90909091 0.83870968 0.83870968 0.86111111 0.84375 ] mean value: 0.8810867809978341 key: train_jcc value: [0.91694352 0.91919192 0.90657439 0.91872792 0.91197183 0.91694352 0.88530466 0.90070922 0.78409091 0.95138889] mean value: 0.9011846780361379 MCC on Blind test: 0.13 Accuracy on Blind test: 0.33 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.10950232 0.09739041 0.09704351 0.09424162 0.09430504 0.09652781 0.09616399 0.10172677 0.10454583 0.09388137] mean value: 0.09853286743164062 key: score_time value: [0.01543546 0.014148 0.01423931 0.01424742 0.0141356 0.0143621 0.01467228 0.01546311 0.01425433 0.01426816] mean value: 0.014522576332092285 key: test_mcc value: [0.96824584 0.96824584 0.93548387 0.96824584 0.93743687 0.96824584 1. 1. 0.90586325 0.93649139] mean value: 0.958825873085774 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98387097 0.98387097 0.96774194 0.98387097 0.96774194 0.98387097 1. 1. 0.95081967 0.96721311] mean value: 0.9789000528820729 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98360656 0.98360656 0.96774194 0.98412698 0.96875 0.98360656 1. 1. 0.95384615 0.96774194] mean value: 0.9793026681072028 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.96774194 0.96875 0.93939394 1. 1. 1. 0.91176471 0.9375 ] mean value: 0.9725150580760163 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96774194 0.96774194 0.96774194 1. 1. 0.96774194 1. 1. 1. 1. ] mean value: 0.9870967741935484 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98387097 0.98387097 0.96774194 0.98387097 0.96774194 0.98387097 1. 1. 0.95 0.96774194] mean value: 0.9788709677419355 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96774194 0.96774194 0.9375 0.96875 0.93939394 0.96774194 1. 1. 0.91176471 0.9375 ] mean value: 0.9598134451727905 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.09 Accuracy on Blind test: 0.21 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.03574371 0.03871679 0.05276203 0.05378747 0.05436444 0.04580855 0.04448795 0.03396726 0.04315591 0.03275442] mean value: 0.04355485439300537 key: score_time value: [0.02251959 0.03061891 0.03506684 0.03417039 0.03329325 0.02393937 0.03167629 0.02193832 0.03418398 0.01938605] mean value: 0.028679299354553222 key: test_mcc value: [1. 0.87096774 1. 0.96824584 0.90369611 0.90748521 0.93743687 0.96824584 1. 0.8688172 ] mean value: 0.9424894812989454 key: train_mcc value: [0.99640932 0.99640932 0.99280576 0.99283145 0.98561151 0.99280576 0.99640932 0.99640932 0.99641572 0.99641577] mean value: 0.9942523261997296 key: test_accuracy value: [1. 0.93548387 1. 0.98387097 0.9516129 0.9516129 0.96774194 0.98387097 1. 0.93442623] mean value: 0.9708619777895293 key: train_accuracy value: [0.99820144 0.99820144 0.99640288 0.99640288 0.99280576 0.99640288 0.99820144 0.99820144 0.99820467 0.99820467] mean value: 0.9971229479612002 key: test_fscore value: [1. 0.93548387 1. 0.98412698 0.95238095 0.94915254 0.96666667 0.98360656 1. 0.93333333] mean value: 0.9704750907225609 key: train_fscore value: [0.9981982 0.9981982 0.99640288 0.99638989 0.99280576 0.99640288 0.99820467 0.9981982 0.9981982 0.99820467] mean value: 0.997120353100802 key: test_precision value: [1. 0.93548387 1. 0.96875 0.9375 1. 1. 1. 1. 0.93333333] mean value: 0.9775067204301076 key: train_precision value: [1. 1. 0.99640288 1. 0.99280576 0.99640288 0.99641577 1. 1. 1. ] mean value: 0.9982027281400686 key: test_recall value: [1. 0.93548387 1. 1. 0.96774194 0.90322581 0.93548387 0.96774194 1. 0.93333333] mean value: 0.9643010752688173 key: train_recall value: [0.99640288 0.99640288 0.99640288 0.99280576 0.99280576 0.99640288 1. 0.99640288 0.99640288 0.99641577] mean value: 0.9960444547587737 key: test_roc_auc value: [1. 0.93548387 1. 0.98387097 0.9516129 0.9516129 0.96774194 0.98387097 1. 0.9344086 ] mean value: 0.9708602150537635 key: train_roc_auc value: [0.99820144 0.99820144 0.99640288 0.99640288 0.99280576 0.99640288 0.99820144 0.99820144 0.99820144 0.99820789] mean value: 0.9971229468038473 key: test_jcc value: [1. 0.87878788 1. 0.96875 0.90909091 0.90322581 0.93548387 0.96774194 1. 0.875 ] mean value: 0.9438080400782014 key: train_jcc value: [0.99640288 0.99640288 0.99283154 0.99280576 0.98571429 0.99283154 0.99641577 0.99640288 0.99640288 0.99641577] mean value: 0.994262617555725 MCC on Blind test: 0.06 Accuracy on Blind test: 0.21 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.18538761 0.21855521 0.19818258 0.16929436 0.21186471 0.22627425 0.19866776 0.15631485 0.20869946 0.18409514] mean value: 0.19573359489440917 key: score_time value: [0.02069068 0.01292896 0.04060698 0.02179265 0.02101707 0.0241735 0.01276922 0.02045441 0.02064705 0.03370333] mean value: 0.022878384590148924 key: test_mcc value: [0.90748521 0.62471615 0.77459667 0.83914639 0.7190925 0.80813523 0.64820372 0.83914639 0.63939757 0.77096774] mean value: 0.7570887579844512 key: train_mcc value: [0.88143754 0.84999939 0.88509826 0.86366703 0.87437795 0.88157448 0.87826623 0.87806148 0.8713058 0.88511972] mean value: 0.8748907880078497 key: test_accuracy value: [0.9516129 0.80645161 0.88709677 0.91935484 0.85483871 0.90322581 0.82258065 0.91935484 0.81967213 0.8852459 ] mean value: 0.8769434161819143 key: train_accuracy value: [0.94064748 0.92446043 0.94244604 0.93165468 0.93705036 0.94064748 0.93884892 0.93884892 0.93536804 0.94254937] mean value: 0.9372521731268486 key: test_fscore value: [0.94915254 0.82352941 0.88888889 0.92063492 0.86567164 0.90625 0.83076923 0.91803279 0.82539683 0.8852459 ] mean value: 0.8813572150143087 key: train_fscore value: [0.94117647 0.92631579 0.9430605 0.93262411 0.93783304 0.94138544 0.93992933 0.93971631 0.93639576 0.94285714] mean value: 0.9381293887479757 key: test_precision value: [1. 0.75675676 0.875 0.90625 0.80555556 0.87878788 0.79411765 0.93333333 0.8125 0.87096774] mean value: 0.8633268913427832 key: train_precision value: [0.93286219 0.90410959 0.93309859 0.91958042 0.92631579 0.92982456 0.92361111 0.92657343 0.92013889 0.93950178] mean value: 0.9255616347793583 key: test_recall value: [0.90322581 0.90322581 0.90322581 0.93548387 0.93548387 0.93548387 0.87096774 0.90322581 0.83870968 0.9 ] mean value: 0.9029032258064515 key: train_recall value: [0.94964029 0.94964029 0.95323741 0.94604317 0.94964029 0.95323741 0.95683453 0.95323741 0.95323741 0.94623656] mean value: 0.9510984760578634 key: test_roc_auc value: [0.9516129 0.80645161 0.88709677 0.91935484 0.85483871 0.90322581 0.82258065 0.91935484 0.81935484 0.88548387] mean value: 0.8769354838709678 key: train_roc_auc value: [0.94064748 0.92446043 0.94244604 0.93165468 0.93705036 0.94064748 0.93884892 0.93884892 0.93540007 0.94254274] mean value: 0.9372547123591449 key: test_jcc value: [0.90322581 0.7 0.8 0.85294118 0.76315789 0.82857143 0.71052632 0.84848485 0.7027027 0.79411765] mean value: 0.790372782026632 key: train_jcc value: [0.88888889 0.8627451 0.89225589 0.87375415 0.88294314 0.88926174 0.88666667 0.88628763 0.88039867 0.89189189] mean value: 0.8835093775860033 MCC on Blind test: 0.22 Accuracy on Blind test: 0.49 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.24915671 0.24478769 0.25460291 0.25357485 0.25452995 0.24537802 0.2443285 0.24822903 0.24816132 0.25430918] mean value: 0.24970581531524658 key: score_time value: [0.00863647 0.0090971 0.00848818 0.00925422 0.00943804 0.00865912 0.00864434 0.00895667 0.00889111 0.00870037] mean value: 0.008876562118530273 key: test_mcc value: [1. 0.87096774 1. 0.96824584 0.93743687 0.90748521 0.96824584 0.96824584 1. 0.8688172 ] mean value: 0.9489444535426244 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.93548387 1. 0.98387097 0.96774194 0.9516129 0.98387097 0.98387097 1. 0.93442623] mean value: 0.9740877842411423 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.93548387 1. 0.98412698 0.96875 0.94915254 0.98360656 0.98360656 1. 0.93333333] mean value: 0.9738059845555039 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.93548387 1. 0.96875 0.93939394 1. 1. 1. 1. 0.93333333] mean value: 0.9776961143695014 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.93548387 1. 1. 1. 0.90322581 0.96774194 0.96774194 1. 0.93333333] mean value: 0.970752688172043 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.93548387 1. 0.98387097 0.96774194 0.9516129 0.98387097 0.98387097 1. 0.9344086 ] mean value: 0.9740860215053764 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.87878788 1. 0.96875 0.93939394 0.90322581 0.96774194 0.96774194 1. 0.875 ] mean value: 0.9500641495601173 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.05 Accuracy on Blind test: 0.19 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.01151943 0.01374769 0.01429176 0.01392269 0.01409912 0.01630855 0.01395249 0.01366401 0.01401973 0.01421189] mean value: 0.013973736763000488 key: score_time value: [0.01094151 0.01091838 0.01081514 0.01111507 0.01110363 0.01155281 0.01108122 0.01173496 0.0118649 0.0110836 ] mean value: 0.01122112274169922 key: test_mcc value: [0.75623534 0.7130241 0.67419986 0.87831007 0.35659298 0.7284928 0.61807005 0.87278605 0.70874158 0.47128445] mean value: 0.6777737268610616 key: train_mcc value: [0.7898587 0.84192273 0.79323895 0.88226013 0.52711711 0.8046478 0.84911865 0.84598626 0.839052 0.5797551 ] mean value: 0.7752957434463477 key: test_accuracy value: [0.87096774 0.85483871 0.82258065 0.93548387 0.64516129 0.85483871 0.80645161 0.93548387 0.85245902 0.72131148] mean value: 0.8299576943416181 key: train_accuracy value: [0.88848921 0.92086331 0.88848921 0.94064748 0.71942446 0.89748201 0.92446043 0.92266187 0.91741472 0.76481149] mean value: 0.8784744197460703 key: test_fscore value: [0.88235294 0.86153846 0.79245283 0.93939394 0.5 0.86956522 0.79310345 0.9375 0.84745763 0.65306122] mean value: 0.8076425689573157 key: train_fscore value: [0.89768977 0.92 0.876 0.94200351 0.6119403 0.9048414 0.92363636 0.92416226 0.91287879 0.70561798] mean value: 0.861877037129891 key: test_precision value: [0.81081081 0.82352941 0.95454545 0.88571429 0.84615385 0.78947368 0.85185185 0.90909091 0.89285714 0.84210526] mean value: 0.8606132660157428 key: train_precision value: [0.82926829 0.93014706 0.98648649 0.9209622 0.99193548 0.84423676 0.93382353 0.90657439 0.964 0.94578313] mean value: 0.9253217337706788 key: test_recall value: [0.96774194 0.90322581 0.67741935 1. 0.35483871 0.96774194 0.74193548 0.96774194 0.80645161 0.53333333] mean value: 0.7920430107526881 key: train_recall value: [0.97841727 0.91007194 0.78776978 0.96402878 0.44244604 0.97482014 0.91366906 0.94244604 0.86690647 0.56272401] mean value: 0.8343299553905263 key: test_roc_auc value: [0.87096774 0.85483871 0.82258065 0.93548387 0.64516129 0.85483871 0.80645161 0.93548387 0.85322581 0.71827957] mean value: 0.8297311827956989 key: train_roc_auc value: [0.88848921 0.92086331 0.88848921 0.94064748 0.71942446 0.89748201 0.92446043 0.92266187 0.91732421 0.76517496] mean value: 0.8785017147572265 key: test_jcc value: [0.78947368 0.75675676 0.65625 0.88571429 0.33333333 0.76923077 0.65714286 0.88235294 0.73529412 0.48484848] mean value: 0.6950397230060543 key: train_jcc value: [0.81437126 0.85185185 0.77935943 0.89036545 0.44086022 0.82621951 0.85810811 0.85901639 0.83972125 0.54513889] mean value: 0.7705012360490753 MCC on Blind test: 0.12 Accuracy on Blind test: 0.77 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.03200245 0.03018045 0.02014041 0.03684139 0.03164029 0.0324831 0.03526735 0.02932453 0.02801824 0.02917194] mean value: 0.0305070161819458 key: score_time value: [0.0254848 0.03329682 0.01865816 0.02000928 0.02121401 0.01249504 0.01821375 0.02222514 0.02327013 0.02214599] mean value: 0.02170131206512451 key: test_mcc value: [0.96824584 0.84266484 0.90369611 0.93743687 0.90748521 0.93548387 0.90369611 0.90369611 0.90215054 0.80322581] mean value: 0.9007781314102745 key: train_mcc value: [0.94283651 0.93585746 0.92124484 0.9354697 0.93563929 0.92494527 0.91054923 0.93563929 0.92138939 0.92878086] mean value: 0.929235183258956 key: test_accuracy value: [0.98387097 0.91935484 0.9516129 0.96774194 0.9516129 0.96774194 0.9516129 0.9516129 0.95081967 0.90163934] mean value: 0.9497620306716024 key: train_accuracy value: [0.97122302 0.9676259 0.96043165 0.9676259 0.9676259 0.96223022 0.95503597 0.9676259 0.96050269 0.96409336] mean value: 0.9644020510700955 key: test_fscore value: [0.98412698 0.92307692 0.95081967 0.96875 0.95384615 0.96774194 0.95081967 0.95081967 0.95081967 0.9 ] mean value: 0.9500820685058522 key: train_fscore value: [0.97163121 0.96819788 0.96099291 0.96797153 0.96808511 0.96283186 0.95575221 0.96808511 0.96099291 0.96478873] mean value: 0.9649329447341147 key: test_precision value: [0.96875 0.88235294 0.96666667 0.93939394 0.91176471 0.96774194 0.96666667 0.96666667 0.96666667 0.9 ] mean value: 0.9436670188603301 key: train_precision value: [0.95804196 0.95138889 0.94755245 0.95774648 0.95454545 0.94773519 0.94076655 0.95454545 0.94755245 0.94809689] mean value: 0.9507971757973318 key: test_recall value: [1. 0.96774194 0.93548387 1. 1. 0.96774194 0.93548387 0.93548387 0.93548387 0.9 ] mean value: 0.957741935483871 key: train_recall value: [0.98561151 0.98561151 0.97482014 0.97841727 0.98201439 0.97841727 0.97122302 0.98201439 0.97482014 0.98207885] mean value: 0.9795028493334365 key: test_roc_auc value: [0.98387097 0.91935484 0.9516129 0.96774194 0.9516129 0.96774194 0.9516129 0.9516129 0.95107527 0.9016129 ] mean value: 0.9497849462365592 key: train_roc_auc value: [0.97122302 0.9676259 0.96043165 0.9676259 0.9676259 0.96223022 0.95503597 0.9676259 0.96052835 0.96406101] mean value: 0.9644013821201104 key: test_jcc value: [0.96875 0.85714286 0.90625 0.93939394 0.91176471 0.9375 0.90625 0.90625 0.90625 0.81818182] mean value: 0.9057733320600968 key: train_jcc value: [0.94482759 0.93835616 0.92491468 0.93793103 0.93814433 0.92832765 0.91525424 0.93814433 0.92491468 0.93197279] mean value: 0.9322787467857844 MCC on Blind test: 0.19 Accuracy on Blind test: 0.44 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:163: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:166: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.25432348 0.2824018 0.20859909 0.19930434 0.2196362 0.19928908 0.19986963 0.19727373 0.24380755 0.21062398] mean value: 0.22151288986206055 key: score_time value: [0.02141023 0.0218761 0.02016091 0.01457238 0.01933861 0.01388955 0.01085019 0.01082468 0.0215745 0.02148724] mean value: 0.017598438262939452 key: test_mcc value: [0.96824584 0.84266484 0.90369611 0.93743687 0.87278605 0.96824584 0.93548387 0.90369611 0.9344086 0.83638369] mean value: 0.9103047822115174 key: train_mcc value: [0.94283651 0.94283651 0.93563929 0.9354697 0.93900081 0.92844206 0.93238486 0.9393413 0.93207468 0.9355825 ] mean value: 0.9363608223697346 key: test_accuracy value: [0.98387097 0.91935484 0.9516129 0.96774194 0.93548387 0.98387097 0.96774194 0.9516129 0.96721311 0.91803279] mean value: 0.9546536224219989 key: train_accuracy value: [0.97122302 0.97122302 0.9676259 0.9676259 0.96942446 0.96402878 0.96582734 0.96942446 0.96588869 0.96768402] mean value: 0.9679975588649368 key: test_fscore value: [0.98412698 0.92307692 0.95081967 0.96875 0.9375 0.98360656 0.96774194 0.95081967 0.96774194 0.91525424] mean value: 0.9549437917099128 key: train_fscore value: [0.97163121 0.97163121 0.96808511 0.96797153 0.96969697 0.96453901 0.9664903 0.9699115 0.96625222 0.96808511] mean value: 0.9684294155648834 key: test_precision value: [0.96875 0.88235294 0.96666667 0.93939394 0.90909091 1. 0.96774194 0.96666667 0.96774194 0.93103448] mean value: 0.9499439476721016 key: train_precision value: [0.95804196 0.95804196 0.95454545 0.95774648 0.96113074 0.95104895 0.94809689 0.95470383 0.95438596 0.95789474] mean value: 0.9555636962921179 key: test_recall value: [1. 0.96774194 0.93548387 1. 0.96774194 0.96774194 0.96774194 0.93548387 0.96774194 0.9 ] mean value: 0.9609677419354838 key: train_recall value: [0.98561151 0.98561151 0.98201439 0.97841727 0.97841727 0.97841727 0.98561151 0.98561151 0.97841727 0.97849462] mean value: 0.9816624120058792 key: test_roc_auc value: [0.98387097 0.91935484 0.9516129 0.96774194 0.93548387 0.98387097 0.96774194 0.9516129 0.9672043 0.91774194] mean value: 0.9546236559139786 key: train_roc_auc value: [0.97122302 0.97122302 0.9676259 0.9676259 0.96942446 0.96402878 0.96582734 0.96942446 0.96591114 0.96766458] mean value: 0.9679978597766948 key: test_jcc value: [0.96875 0.85714286 0.90625 0.93939394 0.88235294 0.96774194 0.9375 0.90625 0.9375 0.84375 ] mean value: 0.9146631673197139 key: train_jcc value: [0.94482759 0.94482759 0.93814433 0.93793103 0.94117647 0.93150685 0.93515358 0.94158076 0.9347079 0.93814433] mean value: 0.9388000430005232 MCC on Blind test: 0.15 Accuracy on Blind test: 0.38 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.02441096 0.02007389 0.02128315 0.01857615 0.01912689 0.01964045 0.02089906 0.01830864 0.02194476 0.0219276 ] mean value: 0.02061915397644043 key: score_time value: [0.01061249 0.01058674 0.01089358 0.01047206 0.01052213 0.01050425 0.01066399 0.01052094 0.01055193 0.01059175] mean value: 0.010591983795166016 key: test_mcc value: [0.56360186 0.56360186 0.75 0.68884672 0.8819171 0.82717019 0.9375 0.87083333 0.80753845 0.82078268] mean value: 0.7711792204154371 key: train_mcc value: [0.83904826 0.83305418 0.804094 0.83230783 0.81084496 0.79737782 0.84634011 0.79137125 0.81153605 0.79748625] mean value: 0.8163460715624157 key: test_accuracy value: [0.78125 0.78125 0.875 0.84375 0.9375 0.90625 0.96774194 0.93548387 0.90322581 0.90322581] mean value: 0.8834677419354838 key: train_accuracy value: [0.91901408 0.91549296 0.90140845 0.91549296 0.90492958 0.89788732 0.92280702 0.89473684 0.90526316 0.89824561] mean value: 0.9075277983691623 key: test_fscore value: [0.78787879 0.77419355 0.875 0.84848485 0.94117647 0.91428571 0.96774194 0.93333333 0.90909091 0.91428571] mean value: 0.886547126181851 key: train_fscore value: [0.9209622 0.91836735 0.90410959 0.91780822 0.90721649 0.90102389 0.92465753 0.89864865 0.90721649 0.90034364] mean value: 0.9100354060453281 key: test_precision value: [0.76470588 0.8 0.875 0.82352941 0.88888889 0.84210526 0.9375 0.93333333 0.88235294 0.84210526] mean value: 0.858952098383213 key: train_precision value: [0.89932886 0.88815789 0.88 0.89333333 0.88590604 0.87417219 0.90604027 0.86928105 0.88590604 0.87919463] mean value: 0.8861320298178448 key: test_recall value: [0.8125 0.75 0.875 0.875 1. 1. 1. 0.93333333 0.9375 1. ] mean value: 0.9183333333333333 key: train_recall value: [0.94366197 0.95070423 0.92957746 0.94366197 0.92957746 0.92957746 0.94405594 0.93006993 0.92957746 0.92253521] mean value: 0.9352999113562493 key: test_roc_auc value: [0.78125 0.78125 0.875 0.84375 0.9375 0.90625 0.96875 0.93541667 0.90208333 0.9 ] mean value: 0.883125 key: train_roc_auc value: [0.91901408 0.91549296 0.90140845 0.91549296 0.90492958 0.89788732 0.9227322 0.89461243 0.90534817 0.89833054] mean value: 0.9075248694967005 key: test_jcc value: [0.65 0.63157895 0.77777778 0.73684211 0.88888889 0.84210526 0.9375 0.875 0.83333333 0.84210526] mean value: 0.8015131578947369 key: train_jcc value: [0.85350318 0.8490566 0.825 0.84810127 0.83018868 0.81987578 0.85987261 0.81595092 0.83018868 0.81875 ] mean value: 0.8350487720908194 MCC on Blind test: 0.22 Accuracy on Blind test: 0.54 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.582335 0.77262783 0.64726782 0.610358 0.68636823 0.73911905 0.64701462 0.70605779 0.70633364 0.64915323] mean value: 0.6746635198593139 key: score_time value: [0.02002954 0.01183629 0.01188588 0.01181579 0.01440883 0.01112986 0.01124287 0.01208949 0.01121664 0.01208615] mean value: 0.012774133682250976 key: test_mcc value: [0.68884672 0.68884672 0.81409158 0.93933644 0.93933644 0.93933644 0.87866878 1. 0.87083333 0.87770745] mean value: 0.8637003892680549 key: train_mcc value: [1. 0.99298237 0.94375558 0.93720088 0.97192739 0.95129413 0.96512319 0.93704438 0.9720266 0.9582759 ] mean value: 0.9629630433458233 key: test_accuracy value: [0.84375 0.84375 0.90625 0.96875 0.96875 0.96875 0.93548387 1. 0.93548387 0.93548387] mean value: 0.9306451612903226 key: train_accuracy value: [1. 0.99647887 0.97183099 0.96830986 0.98591549 0.97535211 0.98245614 0.96842105 0.98596491 0.97894737] mean value: 0.9813676797627873 key: test_fscore value: [0.83870968 0.83870968 0.90322581 0.96774194 0.96969697 0.96774194 0.9375 1. 0.9375 0.94117647] mean value: 0.9302002472543269 key: train_fscore value: [1. 0.99646643 0.97202797 0.96885813 0.98601399 0.97577855 0.98269896 0.96885813 0.98601399 0.97916667] mean value: 0.9815882813444314 key: test_precision value: [0.86666667 0.86666667 0.93333333 1. 0.94117647 1. 0.88235294 1. 0.9375 0.88888889] mean value: 0.9316584967320262 key: train_precision value: [1. 1. 0.96527778 0.95238095 0.97916667 0.95918367 0.97260274 0.95890411 0.97916667 0.96575342] mean value: 0.9732436010934054 key: test_recall value: [0.8125 0.8125 0.875 0.9375 1. 0.9375 1. 1. 0.9375 1. ] mean value: 0.93125 key: train_recall value: [1. 0.99295775 0.97887324 0.98591549 0.99295775 0.99295775 0.99300699 0.97902098 0.99295775 0.99295775] mean value: 0.9901605436816705 key: test_roc_auc value: [0.84375 0.84375 0.90625 0.96875 0.96875 0.96875 0.9375 1. 0.93541667 0.93333333] mean value: 0.930625 key: train_roc_auc value: [1. 0.99647887 0.97183099 0.96830986 0.98591549 0.97535211 0.98241899 0.96838373 0.98598936 0.97899636] mean value: 0.981367576085886 key: test_jcc value: [0.72222222 0.72222222 0.82352941 0.9375 0.94117647 0.9375 0.88235294 1. 0.88235294 0.88888889] mean value: 0.8737745098039216 key: train_jcc value: [1. 0.99295775 0.94557823 0.93959732 0.97241379 0.9527027 0.96598639 0.93959732 0.97241379 0.95918367] mean value: 0.9640430965580683 MCC on Blind test: 0.17 Accuracy on Blind test: 0.43 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.00992465 0.00951338 0.00734925 0.007375 0.00736427 0.00711226 0.007195 0.00721049 0.00700927 0.00702572] mean value: 0.007707929611206055 key: score_time value: [0.0107615 0.00938344 0.00825024 0.0080514 0.00798678 0.00803542 0.00796485 0.00782204 0.00787354 0.00787258] mean value: 0.008400177955627442 key: test_mcc value: [0.625 0.62994079 0.62994079 0.68884672 0.75592895 0.75592895 0.69203857 0.6125 0.69203857 0.82078268] mean value: 0.6902946005610465 key: train_mcc value: [0.74714613 0.73268511 0.71170894 0.74714613 0.71859502 0.73355944 0.73273302 0.71308876 0.7285593 0.7124563 ] mean value: 0.7277678155822408 key: test_accuracy value: [0.8125 0.8125 0.8125 0.84375 0.875 0.875 0.83870968 0.80645161 0.83870968 0.90322581] mean value: 0.8418346774193548 key: train_accuracy value: [0.87323944 0.86619718 0.8556338 0.87323944 0.85915493 0.86619718 0.86315789 0.85614035 0.86315789 0.85614035] mean value: 0.8632258463059056 key: test_fscore value: [0.8125 0.82352941 0.82352941 0.84848485 0.88235294 0.88235294 0.84848485 0.8 0.82758621 0.91428571] mean value: 0.8463106324034316 key: train_fscore value: [0.87586207 0.86805556 0.85813149 0.87586207 0.86111111 0.86986301 0.87213115 0.86006826 0.86779661 0.85714286] mean value: 0.8666024180424603 key: test_precision value: [0.8125 0.77777778 0.77777778 0.82352941 0.83333333 0.83333333 0.77777778 0.8 0.92307692 0.84210526] mean value: 0.8201211597999524 key: train_precision value: [0.85810811 0.85616438 0.84353741 0.85810811 0.84931507 0.84666667 0.82098765 0.84 0.83660131 0.84827586] mean value: 0.845776457348316 key: test_recall value: [0.8125 0.875 0.875 0.875 0.9375 0.9375 0.93333333 0.8 0.75 1. ] mean value: 0.8795833333333334 key: train_recall value: [0.8943662 0.88028169 0.87323944 0.8943662 0.87323944 0.8943662 0.93006993 0.88111888 0.90140845 0.86619718] mean value: 0.8888653599921206 key: test_roc_auc value: [0.8125 0.8125 0.8125 0.84375 0.875 0.875 0.84166667 0.80625 0.84166667 0.9 ] mean value: 0.8420833333333333 key: train_roc_auc value: [0.87323944 0.86619718 0.8556338 0.87323944 0.85915493 0.86619718 0.86292229 0.8560524 0.86329164 0.85617551] mean value: 0.8632103811681276 key: test_jcc value: [0.68421053 0.7 0.7 0.73684211 0.78947368 0.78947368 0.73684211 0.66666667 0.70588235 0.84210526] mean value: 0.7351496388028895 key: train_jcc value: [0.7791411 0.76687117 0.75151515 0.7791411 0.75609756 0.76969697 0.77325581 0.75449102 0.76646707 0.75 ] mean value: 0.7646676954206684 MCC on Blind test: 0.22 Accuracy on Blind test: 0.59 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.0074749 0.00728703 0.00717139 0.00743604 0.00727081 0.00725842 0.00724769 0.00720358 0.00726557 0.00720382] mean value: 0.0072819232940673825 key: score_time value: [0.00798202 0.00782013 0.00794554 0.00785279 0.00791287 0.00786495 0.0078876 0.00792551 0.00783515 0.00804877] mean value: 0.007907533645629882 key: test_mcc value: [0.68884672 0.56360186 0.68884672 0.625 0.438357 0.68884672 0.48954403 0.48333333 0.55573827 0.55573827] mean value: 0.5777852941864914 key: train_mcc value: [0.64814452 0.64814452 0.6479516 0.63405443 0.65572679 0.62714946 0.62393794 0.65616074 0.64212548 0.6494089 ] mean value: 0.6432804381067745 key: test_accuracy value: [0.84375 0.78125 0.84375 0.8125 0.71875 0.84375 0.74193548 0.74193548 0.77419355 0.77419355] mean value: 0.7876008064516129 key: train_accuracy value: [0.82394366 0.82394366 0.82394366 0.81690141 0.82746479 0.81338028 0.81052632 0.82807018 0.82105263 0.8245614 ] mean value: 0.8213787991104522 key: test_fscore value: [0.83870968 0.77419355 0.84848485 0.8125 0.70967742 0.84848485 0.75 0.73333333 0.8 0.8 ] mean value: 0.791538367546432 key: train_fscore value: [0.82638889 0.82638889 0.82269504 0.81944444 0.83161512 0.816609 0.82 0.82807018 0.82105263 0.82638889] mean value: 0.8238653070404355 key: test_precision value: [0.86666667 0.8 0.82352941 0.8125 0.73333333 0.82352941 0.70588235 0.73333333 0.73684211 0.73684211] mean value: 0.7772458720330238 key: train_precision value: [0.81506849 0.81506849 0.82857143 0.80821918 0.81208054 0.80272109 0.78343949 0.83098592 0.81818182 0.81506849] mean value: 0.8129404935574437 key: test_recall value: [0.8125 0.75 0.875 0.8125 0.6875 0.875 0.8 0.73333333 0.875 0.875 ] mean value: 0.8095833333333333 key: train_recall value: [0.83802817 0.83802817 0.81690141 0.83098592 0.85211268 0.83098592 0.86013986 0.82517483 0.82394366 0.83802817] mean value: 0.8354328769821727 key: test_roc_auc value: [0.84375 0.78125 0.84375 0.8125 0.71875 0.84375 0.74375 0.74166667 0.77083333 0.77083333] mean value: 0.7870833333333334 key: train_roc_auc value: [0.82394366 0.82394366 0.82394366 0.81690141 0.82746479 0.81338028 0.81035162 0.82808037 0.82106274 0.82460849] mean value: 0.8213680685511672 key: test_jcc value: [0.72222222 0.63157895 0.73684211 0.68421053 0.55 0.73684211 0.6 0.57894737 0.66666667 0.66666667] mean value: 0.6573976608187134 key: train_jcc value: [0.70414201 0.70414201 0.69879518 0.69411765 0.71176471 0.69005848 0.69491525 0.70658683 0.69642857 0.70414201] mean value: 0.7005092700712355 MCC on Blind test: 0.19 Accuracy on Blind test: 0.54 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00720549 0.00690293 0.00749421 0.0067699 0.00747395 0.00748181 0.00723648 0.00759244 0.0074594 0.00673008] mean value: 0.0072346687316894535 key: score_time value: [0.01040697 0.01126409 0.01092076 0.01008987 0.01062059 0.01053739 0.01394534 0.01144624 0.01064205 0.01186824] mean value: 0.011174154281616212 key: test_mcc value: [0.62994079 0.31311215 0.56360186 0.56360186 0.31814238 0.82717019 0.82285074 0.67916667 0.57461167 0.68826048] mean value: 0.5980458781591939 key: train_mcc value: [0.71838112 0.74655293 0.7253701 0.74655293 0.71142639 0.68311553 0.69826652 0.67718901 0.70556653 0.67774254] mean value: 0.7090163590202463 key: test_accuracy value: [0.8125 0.65625 0.78125 0.78125 0.65625 0.90625 0.90322581 0.83870968 0.77419355 0.83870968] mean value: 0.794858870967742 key: train_accuracy value: [0.85915493 0.87323944 0.86267606 0.87323944 0.8556338 0.8415493 0.84912281 0.83859649 0.85263158 0.83859649] mean value: 0.8544440326167532 key: test_fscore value: [0.8 0.66666667 0.78787879 0.77419355 0.62068966 0.91428571 0.90909091 0.83870968 0.81081081 0.85714286] mean value: 0.7979468626854611 key: train_fscore value: [0.85815603 0.87412587 0.86315789 0.87412587 0.85714286 0.8409894 0.84912281 0.83916084 0.85416667 0.83453237] mean value: 0.8544680614739297 key: test_precision value: [0.85714286 0.64705882 0.76470588 0.8 0.69230769 0.84210526 0.83333333 0.8125 0.71428571 0.78947368] mean value: 0.7752913250320371 key: train_precision value: [0.86428571 0.86805556 0.86013986 0.86805556 0.84827586 0.84397163 0.85211268 0.83916084 0.84246575 0.85294118] mean value: 0.8539464623923748 key: test_recall value: [0.75 0.6875 0.8125 0.75 0.5625 1. 1. 0.86666667 0.9375 0.9375 ] mean value: 0.8304166666666667 key: train_recall value: [0.85211268 0.88028169 0.86619718 0.88028169 0.86619718 0.83802817 0.84615385 0.83916084 0.86619718 0.81690141] mean value: 0.8551511868413277 key: test_roc_auc value: [0.8125 0.65625 0.78125 0.78125 0.65625 0.90625 0.90625 0.83958333 0.76875 0.83541667] mean value: 0.794375 key: train_roc_auc value: [0.85915493 0.87323944 0.86267606 0.87323944 0.8556338 0.8415493 0.84913326 0.8385945 0.85267901 0.83852063] mean value: 0.854442036836403 key: test_jcc value: [0.66666667 0.5 0.65 0.63157895 0.45 0.84210526 0.83333333 0.72222222 0.68181818 0.75 ] mean value: 0.672772461456672 key: train_jcc value: [0.7515528 0.77639752 0.75925926 0.77639752 0.75 0.72560976 0.73780488 0.72289157 0.74545455 0.71604938] mean value: 0.7461417213928212 MCC on Blind test: 0.18 Accuracy on Blind test: 0.56 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01138067 0.01109099 0.01127625 0.01128221 0.01073098 0.0110662 0.01142335 0.01140451 0.01070118 0.01133704] mean value: 0.011169338226318359 key: score_time value: [0.00933075 0.00924778 0.00916719 0.00923133 0.00920081 0.00921702 0.00934625 0.00917053 0.00837874 0.0092051 ] mean value: 0.009149551391601562 key: test_mcc value: [0.625 0.50395263 0.57265629 0.64549722 0.81409158 0.77459667 0.76948376 0.80833333 0.6310315 0.76594169] mean value: 0.6910584675818924 key: train_mcc value: [0.7618988 0.7476577 0.76035829 0.75897979 0.73060671 0.72554232 0.7375982 0.72956319 0.72987459 0.71397006] mean value: 0.7396049640194965 key: test_accuracy value: [0.8125 0.75 0.78125 0.8125 0.90625 0.875 0.87096774 0.90322581 0.80645161 0.87096774] mean value: 0.8389112903225806 key: train_accuracy value: [0.87676056 0.86971831 0.87676056 0.87676056 0.86267606 0.85915493 0.86315789 0.85964912 0.85964912 0.85263158] mean value: 0.8656918705213739 key: test_fscore value: [0.8125 0.76470588 0.8 0.83333333 0.90909091 0.88888889 0.88235294 0.90322581 0.83333333 0.88888889] mean value: 0.8516319983516378 key: train_fscore value: [0.8852459 0.87868852 0.88448845 0.88372093 0.87043189 0.86842105 0.87459807 0.87096774 0.87012987 0.8627451 ] mean value: 0.8749437532470357 key: test_precision value: [0.8125 0.72222222 0.73684211 0.75 0.88235294 0.8 0.78947368 0.875 0.75 0.8 ] mean value: 0.7918390952872377 key: train_precision value: [0.82822086 0.82208589 0.83229814 0.83647799 0.82389937 0.81481481 0.80952381 0.80838323 0.80722892 0.80487805] mean value: 0.8187811065917483 key: test_recall value: [0.8125 0.8125 0.875 0.9375 0.9375 1. 1. 0.93333333 0.9375 1. ] mean value: 0.9245833333333333 key: train_recall value: [0.95070423 0.94366197 0.94366197 0.93661972 0.92253521 0.92957746 0.95104895 0.94405594 0.94366197 0.92957746] mean value: 0.9395104895104895 key: test_roc_auc value: [0.8125 0.75 0.78125 0.8125 0.90625 0.875 0.875 0.90416667 0.80208333 0.86666667] mean value: 0.8385416666666666 key: train_roc_auc value: [0.87676056 0.86971831 0.87676056 0.87676056 0.86267606 0.85915493 0.86284842 0.85935192 0.85994287 0.85290062] mean value: 0.865687481532552 key: test_jcc value: [0.68421053 0.61904762 0.66666667 0.71428571 0.83333333 0.8 0.78947368 0.82352941 0.71428571 0.8 ] mean value: 0.744483266991007 key: train_jcc value: [0.79411765 0.78362573 0.79289941 0.79166667 0.77058824 0.76744186 0.77714286 0.77142857 0.77011494 0.75862069] mean value: 0.7777646609518236 MCC on Blind test: 0.23 Accuracy on Blind test: 0.48 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.94244218 1.01735759 0.8850472 1.05389714 0.87144995 0.99552441 0.90028095 0.87230182 1.03594947 0.87329125] mean value: 0.9447541952133178 key: score_time value: [0.01177907 0.013484 0.01336789 0.01362443 0.01371074 0.01331043 0.01345372 0.01344275 0.01364231 0.01379061] mean value: 0.013360595703125 key: test_mcc value: [0.68884672 0.68884672 0.69991324 0.875 0.8819171 0.875 0.80833333 0.9375 0.74166667 0.82078268] mean value: 0.8017806465017004 key: train_mcc value: [1. 0.99298237 0.99298237 0.99298237 0.98591549 0.99298237 0.98596474 0.9789707 0.99300699 0.99300665] mean value: 0.9908794051074042 key: test_accuracy value: [0.84375 0.84375 0.84375 0.9375 0.9375 0.9375 0.90322581 0.96774194 0.87096774 0.90322581] mean value: 0.8988911290322581 key: train_accuracy value: [1. 0.99647887 0.99647887 0.99647887 0.99295775 0.99647887 0.99298246 0.98947368 0.99649123 0.99649123] mean value: 0.9954311835927848 key: test_fscore value: [0.84848485 0.83870968 0.85714286 0.9375 0.94117647 0.9375 0.90322581 0.96774194 0.875 0.91428571] mean value: 0.9020767309856494 key: train_fscore value: [1. 0.99646643 0.99646643 0.99646643 0.99295775 0.99646643 0.99300699 0.98954704 0.99649123 0.99646643] mean value: 0.99543351613606 key: test_precision value: [0.82352941 0.86666667 0.78947368 0.9375 0.88888889 0.9375 0.875 0.9375 0.875 0.84210526] mean value: 0.8773163914688682 key: train_precision value: [1. 1. 1. 1. 0.99295775 1. 0.99300699 0.98611111 0.99300699 1. ] mean value: 0.9965082843603971 key: test_recall value: [0.875 0.8125 0.9375 0.9375 1. 0.9375 0.93333333 1. 0.875 1. ] mean value: 0.9308333333333333 key: train_recall value: [1. 0.99295775 0.99295775 0.99295775 0.99295775 0.99295775 0.99300699 0.99300699 1. 0.99295775] mean value: 0.9943760464887226 key: test_roc_auc value: [0.84375 0.84375 0.84375 0.9375 0.9375 0.9375 0.90416667 0.96875 0.87083333 0.9 ] mean value: 0.89875 key: train_roc_auc value: [1. 0.99647887 0.99647887 0.99647887 0.99295775 0.99647887 0.99298237 0.98946124 0.9965035 0.99647887] mean value: 0.9954299221904855 key: test_jcc value: [0.73684211 0.72222222 0.75 0.88235294 0.88888889 0.88235294 0.82352941 0.9375 0.77777778 0.84210526] mean value: 0.8243571551427589 key: train_jcc value: [1. 0.99295775 0.99295775 0.99295775 0.98601399 0.99295775 0.98611111 0.97931034 0.99300699 0.99295775] mean value: 0.9909231167354042 MCC on Blind test: 0.18 Accuracy on Blind test: 0.45 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01128316 0.01111746 0.00982285 0.009413 0.00901008 0.00897694 0.00881934 0.00902772 0.00959873 0.00934815] mean value: 0.009641742706298828 key: score_time value: [0.01056886 0.00906277 0.00891089 0.00860476 0.0085628 0.00862837 0.00838184 0.00824451 0.00853562 0.00857925] mean value: 0.008807969093322755 key: test_mcc value: [0.81409158 0.68884672 0.875 1. 0.8819171 0.93933644 0.9375 1. 0.80833333 0.80753845] mean value: 0.8752563621702886 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.90625 0.84375 0.9375 1. 0.9375 0.96875 0.96774194 1. 0.90322581 0.90322581] mean value: 0.9367943548387097 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90909091 0.83870968 0.9375 1. 0.94117647 0.96774194 0.96774194 1. 0.90322581 0.90909091] mean value: 0.9374277643608763 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88235294 0.86666667 0.9375 1. 0.88888889 1. 0.9375 1. 0.93333333 0.88235294] mean value: 0.932859477124183 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.9375 0.8125 0.9375 1. 1. 0.9375 1. 1. 0.875 0.9375] mean value: 0.94375 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90625 0.84375 0.9375 1. 0.9375 0.96875 0.96875 1. 0.90416667 0.90208333] mean value: 0.936875 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83333333 0.72222222 0.88235294 1. 0.88888889 0.9375 0.9375 1. 0.82352941 0.83333333] mean value: 0.8858660130718954 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.02 Accuracy on Blind test: 0.22 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.09702897 0.09689069 0.096277 0.09499073 0.09550571 0.09818435 0.09888387 0.09795642 0.09792018 0.09375334] mean value: 0.09673912525177002 key: score_time value: [0.01839042 0.01852298 0.0182128 0.01794076 0.01818895 0.01855779 0.01845098 0.01811409 0.01863813 0.01832128] mean value: 0.018333816528320314 key: test_mcc value: [0.68884672 0.68884672 0.68884672 0.62994079 0.81409158 0.93933644 0.9375 1. 0.87083333 0.87770745] mean value: 0.8135949748773968 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.84375 0.84375 0.84375 0.8125 0.90625 0.96875 0.96774194 1. 0.93548387 0.93548387] mean value: 0.9057459677419355 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.84848485 0.84848485 0.84848485 0.82352941 0.90909091 0.96969697 0.96774194 1. 0.9375 0.94117647] mean value: 0.9094190242079236 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.82352941 0.82352941 0.82352941 0.77777778 0.88235294 0.94117647 0.9375 1. 0.9375 0.88888889] mean value: 0.883578431372549 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.875 0.875 0.875 0.875 0.9375 1. 1. 1. 0.9375 1. ] mean value: 0.9375 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.84375 0.84375 0.84375 0.8125 0.90625 0.96875 0.96875 1. 0.93541667 0.93333333] mean value: 0.905625 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.73684211 0.73684211 0.73684211 0.7 0.83333333 0.94117647 0.9375 1. 0.88235294 0.88888889] mean value: 0.8393777949776402 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.23 Accuracy on Blind test: 0.45 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00812912 0.00792956 0.00760221 0.00785184 0.00799227 0.0079267 0.00804663 0.00806618 0.00824046 0.00799036] mean value: 0.007977533340454101 key: score_time value: [0.00857091 0.00843978 0.00853491 0.00855112 0.0085063 0.00849175 0.00858927 0.00863695 0.00858855 0.00860786] mean value: 0.008551740646362304 key: test_mcc value: [0.5 0.69991324 0.50395263 0.77459667 0.82717019 0.82717019 0.74689528 0.82078268 0.35983579 0.6125 ] mean value: 0.6672816673588119 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.75 0.84375 0.75 0.875 0.90625 0.90625 0.87096774 0.90322581 0.67741935 0.80645161] mean value: 0.8289314516129032 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.75 0.85714286 0.73333333 0.88888889 0.91428571 0.89655172 0.85714286 0.88888889 0.66666667 0.8125 ] mean value: 0.8265400930487138 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 0.78947368 0.78571429 0.8 0.84210526 1. 0.92307692 1. 0.71428571 0.8125 ] mean value: 0.8417155870445344 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 0.9375 0.6875 1. 1. 0.8125 0.8 0.8 0.625 0.8125] mean value: 0.8225 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.75 0.84375 0.75 0.875 0.90625 0.90625 0.86875 0.9 0.67916667 0.80625 ] mean value: 0.8285416666666667 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.6 0.75 0.57894737 0.8 0.84210526 0.8125 0.75 0.8 0.5 0.68421053] mean value: 0.7117763157894736 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.12 Accuracy on Blind test: 0.49 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.19962478 1.20865035 1.21551681 1.2219255 1.21645331 1.22310948 1.23031688 1.22133994 1.21362448 1.22138143] mean value: 1.2171942949295045 key: score_time value: [0.15371752 0.09660053 0.09662104 0.09716916 0.09705114 0.09664798 0.09718585 0.09721947 0.09733677 0.09707975] mean value: 0.10266292095184326 key: test_mcc value: [0.81409158 0.875 0.875 0.8819171 0.8819171 1. 0.9375 1. 1. 0.9372467 ] mean value: 0.9202672483593498 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.90625 0.9375 0.9375 0.9375 0.9375 1. 0.96774194 1. 1. 0.96774194] mean value: 0.9591733870967742 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90909091 0.9375 0.9375 0.94117647 0.94117647 1. 0.96774194 1. 1. 0.96969697] mean value: 0.9603882755448221 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88235294 0.9375 0.9375 0.88888889 0.88888889 1. 0.9375 1. 1. 0.94117647] mean value: 0.9413807189542484 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.9375 0.9375 0.9375 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.98125 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90625 0.9375 0.9375 0.9375 0.9375 1. 0.96875 1. 1. 0.96666667] mean value: 0.9591666666666667 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83333333 0.88235294 0.88235294 0.88888889 0.88888889 1. 0.9375 1. 1. 0.94117647] mean value: 0.9254493464052287 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.09 Accuracy on Blind test: 0.21 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.87549925 0.98825598 0.90560436 0.87923479 0.87618375 0.90693331 0.90976977 0.86167288 0.91743851 0.89844847] mean value: 0.9019041061401367 key: score_time value: [0.26297545 0.16814804 0.23696327 0.21888471 0.23352385 0.23716521 0.25041318 0.24322152 0.20568323 0.21444941] mean value: 0.2271427869796753 key: test_mcc value: [0.68884672 0.875 0.81409158 0.8819171 0.8819171 0.93933644 0.9375 1. 1. 0.9372467 ] mean value: 0.8955855640414887 key: train_mcc value: [0.96500412 0.95812669 0.95091647 0.93775982 0.94403659 0.94403659 0.95108379 0.94422558 0.94423649 0.94423649] mean value: 0.9483662624447999 key: test_accuracy value: [0.84375 0.9375 0.90625 0.9375 0.9375 0.96875 0.96774194 1. 1. 0.96774194] mean value: 0.9466733870967742 key: train_accuracy value: [0.98239437 0.97887324 0.97535211 0.96830986 0.97183099 0.97183099 0.9754386 0.97192982 0.97192982 0.97192982] mean value: 0.9739819619471214 key: test_fscore value: [0.84848485 0.9375 0.90909091 0.94117647 0.94117647 0.96969697 0.96774194 1. 1. 0.96969697] mean value: 0.9484564573630039 key: train_fscore value: [0.9825784 0.97916667 0.97560976 0.96907216 0.97222222 0.97222222 0.97577855 0.97241379 0.97222222 0.97222222] mean value: 0.9743508213630365 key: test_precision value: [0.82352941 0.9375 0.88235294 0.88888889 0.88888889 0.94117647 0.9375 1. 1. 0.94117647] mean value: 0.9241013071895424 key: train_precision value: [0.97241379 0.96575342 0.96551724 0.94630872 0.95890411 0.95890411 0.96575342 0.95918367 0.95890411 0.95890411] mean value: 0.9610546720455594 key: test_recall value: [0.875 0.9375 0.9375 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.975 key: train_recall value: [0.99295775 0.99295775 0.98591549 0.99295775 0.98591549 0.98591549 0.98601399 0.98601399 0.98591549 0.98591549] mean value: 0.9880478676253325 key: test_roc_auc value: [0.84375 0.9375 0.90625 0.9375 0.9375 0.96875 0.96875 1. 1. 0.96666667] mean value: 0.9466666666666667 key: train_roc_auc value: [0.98239437 0.97887324 0.97535211 0.96830986 0.97183099 0.97183099 0.97540136 0.97188023 0.97197873 0.97197873] mean value: 0.9739830591943268 key: test_jcc value: [0.73684211 0.88235294 0.83333333 0.88888889 0.88888889 0.94117647 0.9375 1. 1. 0.94117647] mean value: 0.905015909872721 key: train_jcc value: [0.96575342 0.95918367 0.95238095 0.94 0.94594595 0.94594595 0.9527027 0.94630872 0.94594595 0.94594595] mean value: 0.9500113261826575 MCC on Blind test: 0.15 Accuracy on Blind test: 0.3 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01972842 0.00707984 0.00703955 0.00708413 0.00708127 0.00711823 0.0071497 0.00713921 0.00711536 0.00711989] mean value: 0.008365559577941894 key: score_time value: [0.00945282 0.00773811 0.00782251 0.00773239 0.00773835 0.00778341 0.00787854 0.00775337 0.00779319 0.00774002] mean value: 0.007943272590637207 key: test_mcc value: [0.68884672 0.56360186 0.68884672 0.625 0.438357 0.68884672 0.48954403 0.48333333 0.55573827 0.55573827] mean value: 0.5777852941864914 key: train_mcc value: [0.64814452 0.64814452 0.6479516 0.63405443 0.65572679 0.62714946 0.62393794 0.65616074 0.64212548 0.6494089 ] mean value: 0.6432804381067745 key: test_accuracy value: [0.84375 0.78125 0.84375 0.8125 0.71875 0.84375 0.74193548 0.74193548 0.77419355 0.77419355] mean value: 0.7876008064516129 key: train_accuracy value: [0.82394366 0.82394366 0.82394366 0.81690141 0.82746479 0.81338028 0.81052632 0.82807018 0.82105263 0.8245614 ] mean value: 0.8213787991104522 key: test_fscore value: [0.83870968 0.77419355 0.84848485 0.8125 0.70967742 0.84848485 0.75 0.73333333 0.8 0.8 ] mean value: 0.791538367546432 key: train_fscore value: [0.82638889 0.82638889 0.82269504 0.81944444 0.83161512 0.816609 0.82 0.82807018 0.82105263 0.82638889] mean value: 0.8238653070404355 key: test_precision value: [0.86666667 0.8 0.82352941 0.8125 0.73333333 0.82352941 0.70588235 0.73333333 0.73684211 0.73684211] mean value: 0.7772458720330238 key: train_precision value: [0.81506849 0.81506849 0.82857143 0.80821918 0.81208054 0.80272109 0.78343949 0.83098592 0.81818182 0.81506849] mean value: 0.8129404935574437 key: test_recall value: [0.8125 0.75 0.875 0.8125 0.6875 0.875 0.8 0.73333333 0.875 0.875 ] mean value: 0.8095833333333333 key: train_recall value: [0.83802817 0.83802817 0.81690141 0.83098592 0.85211268 0.83098592 0.86013986 0.82517483 0.82394366 0.83802817] mean value: 0.8354328769821727 key: test_roc_auc value: [0.84375 0.78125 0.84375 0.8125 0.71875 0.84375 0.74375 0.74166667 0.77083333 0.77083333] mean value: 0.7870833333333334 key: train_roc_auc value: [0.82394366 0.82394366 0.82394366 0.81690141 0.82746479 0.81338028 0.81035162 0.82808037 0.82106274 0.82460849] mean value: 0.8213680685511672 key: test_jcc value: [0.72222222 0.63157895 0.73684211 0.68421053 0.55 0.73684211 0.6 0.57894737 0.66666667 0.66666667] mean value: 0.6573976608187134 key: train_jcc value: [0.70414201 0.70414201 0.69879518 0.69411765 0.71176471 0.69005848 0.69491525 0.70658683 0.69642857 0.70414201] mean value: 0.7005092700712355 MCC on Blind test: 0.19 Accuracy on Blind test: 0.54 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.10866404 0.04417276 0.08032727 0.0377512 0.03836942 0.03934383 0.0412488 0.73144245 0.03698397 0.03865409] mean value: 0.11969578266143799 key: score_time value: [0.0095489 0.00957394 0.00984144 0.00939536 0.0093596 0.00946164 0.00942516 0.00999594 0.01063395 0.00950313] mean value: 0.00967390537261963 key: test_mcc value: [0.81409158 0.81409158 0.875 0.93933644 0.8819171 1. 0.9375 1. 0.9375 0.87770745] mean value: 0.9077144148609821 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.90625 0.90625 0.9375 0.96875 0.9375 1. 0.96774194 1. 0.96774194 0.93548387] mean value: 0.9527217741935484 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90909091 0.90322581 0.9375 0.96969697 0.94117647 1. 0.96774194 1. 0.96774194 0.94117647] mean value: 0.9537350497383704 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88235294 0.93333333 0.9375 0.94117647 0.88888889 1. 0.9375 1. 1. 0.88888889] mean value: 0.9409640522875817 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.9375 0.875 0.9375 1. 1. 1. 1. 1. 0.9375 1. ] mean value: 0.96875 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90625 0.90625 0.9375 0.96875 0.9375 1. 0.96875 1. 0.96875 0.93333333] mean value: 0.9527083333333334 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83333333 0.82352941 0.88235294 0.94117647 0.88888889 1. 0.9375 1. 0.9375 0.88888889] mean value: 0.9133169934640523 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.06 Accuracy on Blind test: 0.2 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.01153302 0.0145793 0.014395 0.0144124 0.01459241 0.0143168 0.01439118 0.01458287 0.01461196 0.0144515 ] mean value: 0.014186644554138183 key: score_time value: [0.01013279 0.01050425 0.0104897 0.0105176 0.01051116 0.01054525 0.01043344 0.01050496 0.01060581 0.01054263] mean value: 0.010478758811950683 key: test_mcc value: [0.81409158 0.81409158 0.93933644 1. 0.8819171 1. 0.87083333 1. 1. 0.9372467 ] mean value: 0.9257516728277053 key: train_mcc value: [0.95812669 0.95812669 0.94403659 0.93720088 0.94403659 0.93720088 0.95108379 0.95145657 0.94470481 0.9582759 ] mean value: 0.948424939171215 key: test_accuracy value: [0.90625 0.90625 0.96875 1. 0.9375 1. 0.93548387 1. 1. 0.96774194] mean value: 0.9621975806451613 key: train_accuracy value: [0.97887324 0.97887324 0.97183099 0.96830986 0.97183099 0.96830986 0.9754386 0.9754386 0.97192982 0.97894737] mean value: 0.9739782554978997 key: test_fscore value: [0.90909091 0.90322581 0.96969697 1. 0.94117647 1. 0.93333333 1. 1. 0.96969697] mean value: 0.962622045885803 key: train_fscore value: [0.97916667 0.97916667 0.97222222 0.96885813 0.97222222 0.96885813 0.97577855 0.97594502 0.97241379 0.97916667] mean value: 0.9743798064418605 key: test_precision value: [0.88235294 0.93333333 0.94117647 1. 0.88888889 1. 0.93333333 1. 1. 0.94117647] mean value: 0.9520261437908497 key: train_precision value: [0.96575342 0.96575342 0.95890411 0.95238095 0.95890411 0.95238095 0.96575342 0.95945946 0.9527027 0.96575342] mean value: 0.9597745984732285 key: test_recall value: [0.9375 0.875 1. 1. 1. 1. 0.93333333 1. 1. 1. ] mean value: 0.9745833333333334 key: train_recall value: [0.99295775 0.99295775 0.98591549 0.98591549 0.98591549 0.98591549 0.98601399 0.99300699 0.99295775 0.99295775] mean value: 0.9894513936767458 key: test_roc_auc value: [0.90625 0.90625 0.96875 1. 0.9375 1. 0.93541667 1. 1. 0.96666667] mean value: 0.9620833333333333 key: train_roc_auc value: [0.97887324 0.97887324 0.97183099 0.96830986 0.97183099 0.96830986 0.97540136 0.97537674 0.97200335 0.97899636] mean value: 0.9739805968679208 key: test_jcc value: [0.83333333 0.82352941 0.94117647 1. 0.88888889 1. 0.875 1. 1. 0.94117647] mean value: 0.9303104575163399 key: train_jcc value: [0.95918367 0.95918367 0.94594595 0.93959732 0.94594595 0.93959732 0.9527027 0.95302013 0.94630872 0.95918367] mean value: 0.9500669104935644 MCC on Blind test: 0.16 Accuracy on Blind test: 0.36 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.00940704 0.00748181 0.00721669 0.00730562 0.00781703 0.00796103 0.0077672 0.00788808 0.00784159 0.00789857] mean value: 0.007858467102050782 key: score_time value: [0.00908256 0.00800729 0.00791621 0.00769114 0.00820541 0.00846505 0.00846457 0.00859737 0.00852084 0.00845647] mean value: 0.008340692520141602 key: test_mcc value: [0.62994079 0.50395263 0.62994079 0.68884672 0.62994079 0.75592895 0.67916667 0.61925228 0.74689528 0.66057826] mean value: 0.6544443153383147 key: train_mcc value: [0.67386056 0.69575325 0.68038921 0.67508446 0.67277821 0.66621443 0.66189073 0.68037155 0.67635913 0.66649204] mean value: 0.67491935676675 key: test_accuracy value: [0.8125 0.75 0.8125 0.84375 0.8125 0.875 0.83870968 0.80645161 0.87096774 0.80645161] mean value: 0.822883064516129 key: train_accuracy value: [0.83450704 0.84507042 0.83802817 0.83450704 0.83450704 0.83098592 0.82807018 0.83859649 0.83508772 0.83157895] mean value: 0.835093896713615 key: test_fscore value: [0.8 0.76470588 0.82352941 0.84848485 0.82352941 0.88235294 0.83870968 0.8125 0.88235294 0.84210526] mean value: 0.8318270377297392 key: train_fscore value: [0.84385382 0.85430464 0.84666667 0.84488449 0.84280936 0.84 0.83934426 0.84666667 0.84488449 0.83892617] mean value: 0.844234056793084 key: test_precision value: [0.85714286 0.72222222 0.77777778 0.82352941 0.77777778 0.83333333 0.8125 0.76470588 0.83333333 0.72727273] mean value: 0.7929595322977676 key: train_precision value: [0.79874214 0.80625 0.80379747 0.79503106 0.80254777 0.79746835 0.79012346 0.8089172 0.79503106 0.80128205] mean value: 0.7999190549175873 key: test_recall value: [0.75 0.8125 0.875 0.875 0.875 0.9375 0.86666667 0.86666667 0.9375 1. ] mean value: 0.8795833333333334 key: train_recall value: [0.8943662 0.9084507 0.8943662 0.90140845 0.88732394 0.88732394 0.8951049 0.88811189 0.90140845 0.88028169] mean value: 0.8938146360681573 key: test_roc_auc value: [0.8125 0.75 0.8125 0.84375 0.8125 0.875 0.83958333 0.80833333 0.86875 0.8 ] mean value: 0.8222916666666666 key: train_roc_auc value: [0.83450704 0.84507042 0.83802817 0.83450704 0.83450704 0.83098592 0.82783414 0.83842214 0.83531961 0.83174924] mean value: 0.8350930759381464 key: test_jcc value: [0.66666667 0.61904762 0.7 0.73684211 0.7 0.78947368 0.72222222 0.68421053 0.78947368 0.72727273] mean value: 0.7135209235209236 key: train_jcc value: [0.72988506 0.74566474 0.73410405 0.73142857 0.7283237 0.72413793 0.72316384 0.73410405 0.73142857 0.72254335] mean value: 0.7304783857563864 MCC on Blind test: 0.22 Accuracy on Blind test: 0.54 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00985122 0.01013541 0.01139951 0.01236129 0.01133943 0.01101375 0.01197648 0.01175308 0.01180387 0.0115931 ] mean value: 0.011322712898254395 key: score_time value: [0.00835967 0.01045895 0.01057839 0.01042914 0.01039219 0.01044703 0.01043487 0.01063824 0.01045942 0.01041865] mean value: 0.0102616548538208 key: test_mcc value: [0.75592895 0.68884672 0.8819171 0.67419986 0.8819171 0.81409158 0.87866878 0.9375 0.87083333 0.9372467 ] mean value: 0.8321150124795701 key: train_mcc value: [0.97183099 0.92966968 0.93775982 0.8661418 0.92365817 0.90901439 0.95798651 0.9114673 0.78397114 0.94395469] mean value: 0.9135454491091561 key: test_accuracy value: [0.875 0.84375 0.9375 0.8125 0.9375 0.90625 0.93548387 0.96774194 0.93548387 0.96774194] mean value: 0.9118951612903226 key: train_accuracy value: [0.98591549 0.96478873 0.96830986 0.92957746 0.96126761 0.95422535 0.97894737 0.95438596 0.89122807 0.97192982] mean value: 0.9560575735112429 key: test_fscore value: [0.88235294 0.83870968 0.94117647 0.76923077 0.94117647 0.90909091 0.9375 0.96774194 0.9375 0.96969697] mean value: 0.9094176143274815 key: train_fscore value: [0.98591549 0.96503497 0.96907216 0.92481203 0.96219931 0.9550173 0.97916667 0.95622896 0.88727273 0.97202797] mean value: 0.9556747588965514 key: test_precision value: [0.83333333 0.86666667 0.88888889 1. 0.88888889 0.88235294 0.88235294 0.9375 0.9375 0.94117647] mean value: 0.9058660130718954 key: train_precision value: [0.98591549 0.95833333 0.94630872 0.99193548 0.93959732 0.93877551 0.97241379 0.92207792 0.91729323 0.96527778] mean value: 0.9537928586676441 key: test_recall value: [0.9375 0.8125 1. 0.625 1. 0.9375 1. 1. 0.9375 1. ] mean value: 0.925 key: train_recall value: [0.98591549 0.97183099 0.99295775 0.86619718 0.98591549 0.97183099 0.98601399 0.99300699 0.85915493 0.97887324] mean value: 0.9591697035359007 key: test_roc_auc value: [0.875 0.84375 0.9375 0.8125 0.9375 0.90625 0.9375 0.96875 0.93541667 0.96666667] mean value: 0.9120833333333334 key: train_roc_auc value: [0.98591549 0.96478873 0.96830986 0.92957746 0.96126761 0.95422535 0.97892249 0.95424998 0.89111593 0.9719541 ] mean value: 0.9560326996946715 key: test_jcc value: [0.78947368 0.72222222 0.88888889 0.625 0.88888889 0.83333333 0.88235294 0.9375 0.88235294 0.94117647] mean value: 0.8391189370485036 key: train_jcc value: [0.97222222 0.93243243 0.94 0.86013986 0.92715232 0.91390728 0.95918367 0.91612903 0.79738562 0.94557823] mean value: 0.9164130675378523 MCC on Blind test: 0.18 Accuracy on Blind test: 0.49 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01099372 0.01085377 0.01167774 0.011415 0.01157284 0.01085854 0.01104522 0.01073241 0.01113367 0.01111031] mean value: 0.011139321327209472 key: score_time value: [0.0103929 0.0103898 0.01037788 0.0103898 0.01040602 0.01039386 0.01040792 0.01042295 0.01051378 0.01045513] mean value: 0.010415005683898925 key: test_mcc value: [0.44539933 0.32025631 0.81409158 0.57735027 0.77459667 0.75592895 0.87866878 0.9375 0.87083333 0.76594169] mean value: 0.714056690328539 key: train_mcc value: [0.87107074 0.62077843 0.83774371 0.57207859 0.80452795 0.84114227 0.89199759 0.83981496 0.95090121 0.86664533] mean value: 0.8096700777785382 key: test_accuracy value: [0.71875 0.625 0.90625 0.75 0.875 0.875 0.93548387 0.96774194 0.93548387 0.87096774] mean value: 0.8459677419354839 key: train_accuracy value: [0.93309859 0.77816901 0.91549296 0.75 0.8943662 0.91549296 0.94385965 0.91578947 0.9754386 0.92982456] mean value: 0.8951531999011614 key: test_fscore value: [0.68965517 0.45454545 0.90909091 0.66666667 0.88888889 0.88235294 0.9375 0.96774194 0.9375 0.88888889] mean value: 0.8222830857154942 key: train_fscore value: [0.92936803 0.71493213 0.9205298 0.66976744 0.90384615 0.92156863 0.94666667 0.92156863 0.9754386 0.93377483] mean value: 0.8837460905964674 key: test_precision value: [0.76923077 0.83333333 0.88235294 1. 0.8 0.83333333 0.88235294 0.9375 0.9375 0.8 ] mean value: 0.8675603318250378 key: train_precision value: [0.98425197 1. 0.86875 0.98630137 0.82941176 0.8597561 0.9044586 0.86503067 0.97202797 0.88125 ] mean value: 0.9151238446234521 key: test_recall value: [0.625 0.3125 0.9375 0.5 1. 0.9375 1. 1. 0.9375 1. ] mean value: 0.825 key: train_recall value: [0.88028169 0.55633803 0.97887324 0.50704225 0.99295775 0.99295775 0.99300699 0.98601399 0.97887324 0.99295775] mean value: 0.8859302669161824 key: test_roc_auc value: [0.71875 0.625 0.90625 0.75 0.875 0.875 0.9375 0.96875 0.93541667 0.86666667] mean value: 0.8458333333333333 key: train_roc_auc value: [0.93309859 0.77816901 0.91549296 0.75 0.8943662 0.91549296 0.9436866 0.9155422 0.97545061 0.93004531] mean value: 0.8951344430217669 key: test_jcc value: [0.52631579 0.29411765 0.83333333 0.5 0.8 0.78947368 0.88235294 0.9375 0.88235294 0.8 ] mean value: 0.7245446336429309 key: train_jcc value: [0.86805556 0.55633803 0.85276074 0.5034965 0.8245614 0.85454545 0.89873418 0.85454545 0.95205479 0.8757764 ] mean value: 0.8040868505268339 MCC on Blind test: 0.08 Accuracy on Blind test: 0.19 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.09240365 0.08133793 0.08125806 0.08046818 0.08067393 0.08131313 0.08116984 0.0813272 0.08116603 0.08120728] mean value: 0.08223252296447754 key: score_time value: [0.01535177 0.0154326 0.01515222 0.01522565 0.01519728 0.0153811 0.01536131 0.01532435 0.01529288 0.01531577] mean value: 0.015303492546081543 key: test_mcc value: [0.81409158 0.875 0.93933644 0.81409158 0.93933644 1. 0.9375 1. 1. 0.87770745] mean value: 0.9197063481549348 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.90625 0.9375 0.96875 0.90625 0.96875 1. 0.96774194 1. 1. 0.93548387] mean value: 0.9590725806451613 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90909091 0.9375 0.96969697 0.90322581 0.96969697 1. 0.96774194 1. 1. 0.94117647] mean value: 0.9598129061008568 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88235294 0.9375 0.94117647 0.93333333 0.94117647 1. 0.9375 1. 1. 0.88888889] mean value: 0.9461928104575164 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.9375 0.9375 1. 0.875 1. 1. 1. 1. 1. 1. ] mean value: 0.975 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90625 0.9375 0.96875 0.90625 0.96875 1. 0.96875 1. 1. 0.93333333] mean value: 0.9589583333333334 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83333333 0.88235294 0.94117647 0.82352941 0.94117647 1. 0.9375 1. 1. 0.88888889] mean value: 0.9247957516339869 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.05 Accuracy on Blind test: 0.19 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.03232431 0.02845907 0.02876759 0.02964163 0.04149294 0.0316689 0.04340553 0.03582311 0.04470134 0.04033637] mean value: 0.035662078857421876 key: score_time value: [0.01759839 0.02218199 0.01889658 0.01943088 0.02917433 0.03231716 0.03496408 0.03411865 0.02071142 0.01735568] mean value: 0.02467491626739502 key: test_mcc value: [0.81409158 0.81409158 0.875 0.93933644 1. 1. 0.87866878 1. 0.87866878 0.9372467 ] mean value: 0.913710384964254 key: train_mcc value: [0.99298237 1. 0.99298237 1. 0.99298237 0.98591549 1. 0.98596474 0.99300665 0.98596474] mean value: 0.9929798730055359 key: test_accuracy value: [0.90625 0.90625 0.9375 0.96875 1. 1. 0.93548387 1. 0.93548387 0.96774194] mean value: 0.9557459677419354 key: train_accuracy value: [0.99647887 1. 0.99647887 1. 0.99647887 0.99295775 1. 0.99298246 0.99649123 0.99298246] mean value: 0.996485050654806 key: test_fscore value: [0.90909091 0.90322581 0.9375 0.96774194 1. 1. 0.9375 1. 0.93333333 0.96969697] mean value: 0.9558088954056696 key: train_fscore value: [0.99646643 1. 0.99646643 1. 0.99646643 0.99295775 1. 0.99300699 0.99646643 0.99295775] mean value: 0.9964788210346365 key: test_precision value: [0.88235294 0.93333333 0.9375 1. 1. 1. 0.88235294 1. 1. 0.94117647] mean value: 0.957671568627451 key: train_precision value: [1. 1. 1. 1. 1. 0.99295775 1. 0.99300699 1. 0.99295775] mean value: 0.997892248596474 key: test_recall value: [0.9375 0.875 0.9375 0.9375 1. 1. 1. 1. 0.875 1. ] mean value: 0.95625 key: train_recall value: [0.99295775 1. 0.99295775 1. 0.99295775 0.99295775 1. 0.99300699 0.99295775 0.99295775] mean value: 0.9950753471880233 key: test_roc_auc value: [0.90625 0.90625 0.9375 0.96875 1. 1. 0.9375 1. 0.9375 0.96666667] mean value: 0.9560416666666667 key: train_roc_auc value: [0.99647887 1. 0.99647887 1. 0.99647887 0.99295775 1. 0.99298237 0.99647887 0.99298237] mean value: 0.9964837978922486 key: test_jcc value: [0.83333333 0.82352941 0.88235294 0.9375 1. 1. 0.88235294 1. 0.875 0.94117647] mean value: 0.9175245098039215 key: train_jcc value: [0.99295775 1. 0.99295775 1. 0.99295775 0.98601399 1. 0.98611111 0.99295775 0.98601399] mean value: 0.9929970069054577 MCC on Blind test: 0.06 Accuracy on Blind test: 0.2 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.05620861 0.0974803 0.04944372 0.04494715 0.06924796 0.05304265 0.03282356 0.03296423 0.03626871 0.06697369] mean value: 0.0539400577545166 key: score_time value: [0.02177811 0.02000475 0.01141953 0.01396704 0.02782083 0.01147771 0.011482 0.01143765 0.01138997 0.02080035] mean value: 0.01615779399871826 key: test_mcc value: [0.62994079 0.438357 0.56360186 0.68884672 0.75 0.68884672 0.80833333 0.74166667 0.68826048 0.76594169] mean value: 0.6763795258534475 key: train_mcc value: [0.8612933 0.86052165 0.83971646 0.85382934 0.85314992 0.86794223 0.84766497 0.84023701 0.85436741 0.84697783] mean value: 0.8525700111060143 key: test_accuracy value: [0.8125 0.71875 0.78125 0.84375 0.875 0.84375 0.90322581 0.87096774 0.83870968 0.87096774] mean value: 0.8358870967741936 key: train_accuracy value: [0.92957746 0.92957746 0.91901408 0.92605634 0.92605634 0.93309859 0.92280702 0.91929825 0.92631579 0.92280702] mean value: 0.925460835186558 key: test_fscore value: [0.8 0.70967742 0.78787879 0.84848485 0.875 0.84848485 0.90322581 0.86666667 0.85714286 0.88888889] mean value: 0.838545012335335 key: train_fscore value: [0.93197279 0.93150685 0.92150171 0.92832765 0.92783505 0.93515358 0.92567568 0.9220339 0.92832765 0.92465753] mean value: 0.9276992378409221 key: test_precision value: [0.85714286 0.73333333 0.76470588 0.82352941 0.875 0.82352941 0.875 0.86666667 0.78947368 0.8 ] mean value: 0.8208381247235736 key: train_precision value: [0.90131579 0.90666667 0.89403974 0.90066225 0.90604027 0.90728477 0.89542484 0.89473684 0.90066225 0.9 ] mean value: 0.9006833409925814 key: test_recall value: [0.75 0.6875 0.8125 0.875 0.875 0.875 0.93333333 0.86666667 0.9375 1. ] mean value: 0.86125 key: train_recall value: [0.96478873 0.95774648 0.95070423 0.95774648 0.95070423 0.96478873 0.95804196 0.95104895 0.95774648 0.95070423] mean value: 0.9564020486555698 key: test_roc_auc value: [0.8125 0.71875 0.78125 0.84375 0.875 0.84375 0.90416667 0.87083333 0.83541667 0.86666667] mean value: 0.8352083333333333 key: train_roc_auc value: [0.92957746 0.92957746 0.91901408 0.92605634 0.92605634 0.93309859 0.92268295 0.91918645 0.92642569 0.92290456] mean value: 0.9254579927115139 key: test_jcc value: [0.66666667 0.55 0.65 0.73684211 0.77777778 0.73684211 0.82352941 0.76470588 0.75 0.8 ] mean value: 0.7256363949088407 key: train_jcc value: [0.87261146 0.87179487 0.85443038 0.86624204 0.86538462 0.87820513 0.86163522 0.85534591 0.86624204 0.85987261] mean value: 0.8651764280073164 MCC on Blind test: 0.18 Accuracy on Blind test: 0.54 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.16305089 0.16052961 0.15664053 0.15771556 0.15495181 0.15255976 0.15471911 0.15581322 0.15795827 0.15890527] mean value: 0.15728440284729003 key: score_time value: [0.00907922 0.00902605 0.00912547 0.0093677 0.00861001 0.00851989 0.00923562 0.00842047 0.00907159 0.00920391] mean value: 0.00896599292755127 key: test_mcc value: [0.81409158 0.875 0.875 1. 1. 1. 0.9375 1. 1. 0.9372467 ] mean value: 0.9438838276217104 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.90625 0.9375 0.9375 1. 1. 1. 0.96774194 1. 1. 0.96774194] mean value: 0.9716733870967742 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.90909091 0.9375 0.9375 1. 1. 1. 0.96774194 1. 1. 0.96969697] mean value: 0.972152981427175 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88235294 0.9375 0.9375 1. 1. 1. 0.9375 1. 1. 0.94117647] mean value: 0.9636029411764706 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.9375 0.9375 0.9375 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.98125 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.90625 0.9375 0.9375 1. 1. 1. 0.96875 1. 1. 0.96666667] mean value: 0.9716666666666667 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.83333333 0.88235294 0.88235294 1. 1. 1. 0.9375 1. 1. 0.94117647] mean value: 0.9476715686274509 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.05 Accuracy on Blind test: 0.19 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.01113367 0.01251006 0.01261091 0.01772285 0.01263809 0.01343751 0.0127852 0.01266503 0.01297426 0.01285744] mean value: 0.013133502006530762 key: score_time value: [0.01069093 0.01078391 0.0107305 0.01083326 0.01099324 0.01084566 0.01136661 0.01079345 0.01084757 0.01162291] mean value: 0.010950803756713867 key: test_mcc value: [0.68884672 0.59215653 0.81409158 0.56360186 0.77459667 0.75 0.74896053 0.54812195 0.53006813 0.82078268] mean value: 0.6831226650318738 key: train_mcc value: [0.8145351 0.86223926 0.87332606 0.85924016 0.86725157 0.87541287 0.7742616 0.84773912 0.81144956 0.88848951] mean value: 0.8473944811490282 key: test_accuracy value: [0.84375 0.78125 0.90625 0.78125 0.875 0.875 0.87096774 0.77419355 0.74193548 0.90322581] mean value: 0.8352822580645162 key: train_accuracy value: [0.90140845 0.92957746 0.93661972 0.92957746 0.93309859 0.93661972 0.88070175 0.92280702 0.90175439 0.94385965] mean value: 0.9216024215468248 key: test_fscore value: [0.83870968 0.81081081 0.90322581 0.78787879 0.88888889 0.875 0.875 0.75862069 0.69230769 0.91428571] mean value: 0.8344728067698034 key: train_fscore value: [0.89230769 0.92647059 0.93706294 0.92907801 0.9347079 0.93430657 0.89102564 0.92028986 0.89393939 0.94244604] mean value: 0.9201634638116422 key: test_precision value: [0.86666667 0.71428571 0.93333333 0.76470588 0.8 0.875 0.82352941 0.78571429 0.9 0.84210526] mean value: 0.8305340557275542 key: train_precision value: [0.98305085 0.96923077 0.93055556 0.93571429 0.91275168 0.96969697 0.82248521 0.95488722 0.96721311 0.96323529] mean value: 0.9408820939525007 key: test_recall value: [0.8125 0.9375 0.875 0.8125 1. 0.875 0.93333333 0.73333333 0.5625 1. ] mean value: 0.8541666666666666 key: train_recall value: [0.81690141 0.88732394 0.94366197 0.92253521 0.95774648 0.90140845 0.97202797 0.88811189 0.83098592 0.92253521] mean value: 0.9043238451689156 key: test_roc_auc value: [0.84375 0.78125 0.90625 0.78125 0.875 0.875 0.87291667 0.77291667 0.74791667 0.9 ] mean value: 0.8356250000000001 key: train_roc_auc value: [0.90140845 0.92957746 0.93661972 0.92957746 0.93309859 0.93661972 0.88038018 0.92292918 0.90150694 0.94378509] mean value: 0.9215502807052103 key: test_jcc value: [0.72222222 0.68181818 0.82352941 0.65 0.8 0.77777778 0.77777778 0.61111111 0.52941176 0.84210526] mean value: 0.7215753510335554 key: train_jcc value: [0.80555556 0.8630137 0.88157895 0.86754967 0.87741935 0.87671233 0.80346821 0.85234899 0.80821918 0.89115646] mean value: 0.8527022396082421 MCC on Blind test: 0.2 Accuracy on Blind test: 0.58 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.01892829 0.02864075 0.02735972 0.02840042 0.0284605 0.02859259 0.02847409 0.02875304 0.02716875 0.01797819] mean value: 0.026275634765625 key: score_time value: [0.01286125 0.01076007 0.01073647 0.01065683 0.01062083 0.01068902 0.01063824 0.03288555 0.0109849 0.02092481] mean value: 0.014175796508789062 key: test_mcc value: [0.68884672 0.75 0.81409158 0.93933644 0.8819171 1. 0.87083333 0.9372467 0.87083333 0.9372467 ] mean value: 0.8690351901199767 key: train_mcc value: [0.92994649 0.90955652 0.90901439 0.89492115 0.91585639 0.90955652 0.93741093 0.90253931 0.90988464 0.90897898] mean value: 0.9127665317222325 key: test_accuracy value: [0.84375 0.875 0.90625 0.96875 0.9375 1. 0.93548387 0.96774194 0.93548387 0.96774194] mean value: 0.9337701612903225 key: train_accuracy value: [0.96478873 0.95422535 0.95422535 0.9471831 0.95774648 0.95422535 0.96842105 0.95087719 0.95438596 0.95438596] mean value: 0.9560464541635779 key: test_fscore value: [0.84848485 0.875 0.90322581 0.96969697 0.94117647 1. 0.93333333 0.96551724 0.9375 0.96969697] mean value: 0.934363163963128 key: train_fscore value: [0.96527778 0.95532646 0.9550173 0.94809689 0.95833333 0.95532646 0.96907216 0.95205479 0.95532646 0.95470383] mean value: 0.9568535471627235 key: test_precision value: [0.82352941 0.875 0.93333333 0.94117647 0.88888889 1. 0.93333333 1. 0.9375 0.94117647] mean value: 0.9273937908496732 key: train_precision value: [0.95205479 0.93288591 0.93877551 0.93197279 0.94520548 0.93288591 0.9527027 0.93288591 0.93288591 0.94482759] mean value: 0.9397082486363004 key: test_recall value: [0.875 0.875 0.875 1. 1. 1. 0.93333333 0.93333333 0.9375 1. ] mean value: 0.9429166666666666 key: train_recall value: [0.97887324 0.97887324 0.97183099 0.96478873 0.97183099 0.97887324 0.98601399 0.97202797 0.97887324 0.96478873] mean value: 0.9746774352408155 key: test_roc_auc value: [0.84375 0.875 0.90625 0.96875 0.9375 1. 0.93541667 0.96666667 0.93541667 0.96666667] mean value: 0.9335416666666667 key: train_roc_auc value: [0.96478873 0.95422535 0.95422535 0.9471831 0.95774648 0.95422535 0.96835911 0.95080272 0.95447158 0.95442234] mean value: 0.9560450113267015 key: test_jcc value: [0.73684211 0.77777778 0.82352941 0.94117647 0.88888889 1. 0.875 0.93333333 0.88235294 0.94117647] mean value: 0.8800077399380805 key: train_jcc value: [0.93288591 0.91447368 0.91390728 0.90131579 0.92 0.91447368 0.94 0.90849673 0.91447368 0.91333333] mean value: 0.9173360098273221 MCC on Blind test: 0.22 Accuracy on Blind test: 0.49 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:183: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:186: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.16565156 0.08813143 0.17730904 0.20824528 0.18379951 0.1740315 0.17967129 0.17941952 0.18022633 0.19038606] mean value: 0.17268714904785157 key: score_time value: [0.0107646 0.01237965 0.01942182 0.01081586 0.01998901 0.01066494 0.01124215 0.01264286 0.01966715 0.02074218] mean value: 0.014833021163940429 key: test_mcc value: [0.81409158 0.75 0.81409158 1. 0.8819171 1. 0.87083333 1. 1. 0.87770745] mean value: 0.900864104531543 key: train_mcc value: [0.95129413 0.94450549 0.94403659 0.93720088 0.94403659 0.93720088 0.93741093 0.93741093 0.93130575 0.95146839] mean value: 0.9415870567033926 key: test_accuracy value: [0.90625 0.875 0.90625 1. 0.9375 1. 0.93548387 1. 1. 0.93548387] mean value: 0.9495967741935484 key: train_accuracy value: [0.97535211 0.97183099 0.97183099 0.96830986 0.97183099 0.96830986 0.96842105 0.96842105 0.96491228 0.9754386 ] mean value: 0.9704657771188535 key: test_fscore value: [0.90909091 0.875 0.90322581 1. 0.94117647 1. 0.93333333 1. 1. 0.94117647] mean value: 0.9503002990052326 key: train_fscore value: [0.97577855 0.97241379 0.97222222 0.96885813 0.97222222 0.96885813 0.96907216 0.96907216 0.96575342 0.97577855] mean value: 0.9710029348503718 key: test_precision value: [0.88235294 0.875 0.93333333 1. 0.88888889 1. 0.93333333 1. 1. 0.88888889] mean value: 0.9401797385620915 key: train_precision value: [0.95918367 0.9527027 0.95890411 0.95238095 0.95890411 0.95238095 0.9527027 0.9527027 0.94 0.95918367] mean value: 0.953904557898687 key: test_recall value: [0.9375 0.875 0.875 1. 1. 1. 0.93333333 1. 1. 1. ] mean value: 0.9620833333333333 key: train_recall value: [0.99295775 0.99295775 0.98591549 0.98591549 0.98591549 0.98591549 0.98601399 0.98601399 0.99295775 0.99295775] mean value: 0.9887520929774452 key: test_roc_auc value: [0.90625 0.875 0.90625 1. 0.9375 1. 0.93541667 1. 1. 0.93333333] mean value: 0.949375 key: train_roc_auc value: [0.97535211 0.97183099 0.97183099 0.96830986 0.97183099 0.96830986 0.96835911 0.96835911 0.96501034 0.97549985] mean value: 0.9704693194129814 key: test_jcc value: [0.83333333 0.77777778 0.82352941 1. 0.88888889 1. 0.875 1. 1. 0.88888889] mean value: 0.9087418300653595 key: train_jcc value: [0.9527027 0.94630872 0.94594595 0.93959732 0.94594595 0.93959732 0.94 0.94 0.93377483 0.9527027 ] mean value: 0.9436575487439082 MCC on Blind test: 0.2 Accuracy on Blind test: 0.43 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.02582741 0.02434468 0.02545023 0.02682304 0.02640653 0.02720022 0.02612591 0.02765298 0.02734327 0.04823351] mean value: 0.028540778160095214 key: score_time value: [0.01104569 0.01089406 0.01136374 0.01076293 0.01096463 0.01084781 0.01096702 0.01096272 0.01116037 0.01098251] mean value: 0.010995149612426758 key: test_mcc value: [0.81325006 0.87831007 0.80813523 0.78446454 0.77459667 0.83914639 0.80813523 0.90748521 0.73763441 0.77382584] mean value: 0.8124983647487063 key: train_mcc value: [0.83119879 0.83472681 0.83507281 0.87790234 0.85985131 0.84227171 0.84207536 0.83472681 0.85645761 0.83886705] mean value: 0.8453150611021845 key: test_accuracy value: [0.90322581 0.93548387 0.90322581 0.88709677 0.88709677 0.91935484 0.90322581 0.9516129 0.86885246 0.8852459 ] mean value: 0.9044420941300899 key: train_accuracy value: [0.91546763 0.91726619 0.91726619 0.93884892 0.92985612 0.92086331 0.92086331 0.91726619 0.92818671 0.91921005] mean value: 0.9225094610128773 key: test_fscore value: [0.90909091 0.93939394 0.90625 0.87719298 0.88888889 0.92063492 0.90625 0.95384615 0.86666667 0.89230769] mean value: 0.9060522153285311 key: train_fscore value: [0.91651865 0.91814947 0.91872792 0.93950178 0.93048128 0.92226148 0.92198582 0.91814947 0.92882562 0.92035398] mean value: 0.9234955465227851 key: test_precision value: [0.85714286 0.88571429 0.87878788 0.96153846 0.875 0.90625 0.87878788 0.91176471 0.86666667 0.85294118] mean value: 0.887459391099097 key: train_precision value: [0.90526316 0.9084507 0.90277778 0.92957746 0.92226148 0.90625 0.90909091 0.9084507 0.92226148 0.90592334] mean value: 0.9120307031148476 key: test_recall value: [0.96774194 1. 0.93548387 0.80645161 0.90322581 0.93548387 0.93548387 1. 0.86666667 0.93548387] mean value: 0.9286021505376344 key: train_recall value: [0.92805755 0.92805755 0.9352518 0.94964029 0.93884892 0.93884892 0.9352518 0.92805755 0.93548387 0.9352518 ] mean value: 0.9352750058018101 key: test_roc_auc value: [0.90322581 0.93548387 0.90322581 0.88709677 0.88709677 0.91935484 0.90322581 0.9516129 0.8688172 0.8844086 ] mean value: 0.9043548387096774 key: train_roc_auc value: [0.91546763 0.91726619 0.91726619 0.93884892 0.92985612 0.92086331 0.92086331 0.91726619 0.92817359 0.9192388 ] mean value: 0.922511023439313 key: test_jcc value: [0.83333333 0.88571429 0.82857143 0.78125 0.8 0.85294118 0.82857143 0.91176471 0.76470588 0.80555556] mean value: 0.8292407796451914 key: train_jcc value: [0.84590164 0.84868421 0.8496732 0.88590604 0.87 0.8557377 0.85526316 0.84868421 0.86710963 0.85245902] mean value: 0.8579418817037436 MCC on Blind test: 0.21 Accuracy on Blind test: 0.53 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.77656746 0.70605206 0.85919523 0.69673634 0.68120766 0.78336358 0.78208661 0.70059681 0.83346748 0.76766825] mean value: 0.7586941480636596 key: score_time value: [0.01191044 0.01261759 0.01475716 0.01256537 0.0127914 0.01143336 0.01280951 0.01418829 0.01239324 0.01240849] mean value: 0.012787485122680664 key: test_mcc value: [0.90369611 0.93743687 0.90369611 0.82199494 0.84266484 0.93743687 0.90369611 0.87278605 0.87055472 0.96770777] mean value: 0.8961670394093372 key: train_mcc value: [0.97124816 0.96043787 0.96402878 0.94966486 0.9497386 0.96402878 0.94986154 0.96405373 0.97487139 0.96768995] mean value: 0.9615623654982854 key: test_accuracy value: [0.9516129 0.96774194 0.9516129 0.90322581 0.91935484 0.96774194 0.9516129 0.93548387 0.93442623 0.98360656] mean value: 0.9466419883659439 key: train_accuracy value: [0.98561151 0.98021583 0.98201439 0.97482014 0.97482014 0.98201439 0.97482014 0.98201439 0.98743268 0.98384201] mean value: 0.9807605621068675 key: test_fscore value: [0.95081967 0.96666667 0.95081967 0.89285714 0.92307692 0.96875 0.95238095 0.93333333 0.93103448 0.98412698] mean value: 0.9453865829462917 key: train_fscore value: [0.98566308 0.98018018 0.98201439 0.97491039 0.975 0.98201439 0.97508897 0.98207885 0.98747764 0.98378378] mean value: 0.9808211677303444 key: test_precision value: [0.96666667 1. 0.96666667 1. 0.88235294 0.93939394 0.9375 0.96551724 0.96428571 0.96875 ] mean value: 0.9591133169568768 key: train_precision value: [0.98214286 0.98194946 0.98201439 0.97142857 0.96808511 0.98201439 0.96478873 0.97857143 0.98571429 0.98555957] mean value: 0.9782268783883663 key: test_recall value: [0.93548387 0.93548387 0.93548387 0.80645161 0.96774194 1. 0.96774194 0.90322581 0.9 1. ] mean value: 0.9351612903225807 key: train_recall value: [0.98920863 0.97841727 0.98201439 0.97841727 0.98201439 0.98201439 0.98561151 0.98561151 0.98924731 0.98201439] mean value: 0.9834571052835152 key: test_roc_auc value: [0.9516129 0.96774194 0.9516129 0.90322581 0.91935484 0.96774194 0.9516129 0.93548387 0.93387097 0.98333333] mean value: 0.9465591397849463 key: train_roc_auc value: [0.98561151 0.98021583 0.98201439 0.97482014 0.97482014 0.98201439 0.97482014 0.98201439 0.98742941 0.98383874] mean value: 0.9807599082024703 key: test_jcc value: [0.90625 0.93548387 0.90625 0.80645161 0.85714286 0.93939394 0.90909091 0.875 0.87096774 0.96875 ] mean value: 0.8974780931434158 key: train_jcc value: [0.97173145 0.96113074 0.96466431 0.95104895 0.95121951 0.96466431 0.95138889 0.96478873 0.97526502 0.96808511] mean value: 0.9623987021299 MCC on Blind test: 0.14 Accuracy on Blind test: 0.35 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.01096439 0.01291323 0.00820684 0.00848889 0.00764108 0.0079782 0.00760031 0.00791764 0.00761533 0.00789833] mean value: 0.008722424507141113 key: score_time value: [0.01091075 0.00868392 0.00824642 0.00876021 0.00801086 0.00799608 0.00807261 0.00803852 0.00840473 0.00831747] mean value: 0.008544158935546876 key: test_mcc value: [0.67883359 0.64549722 0.7130241 0.52981294 0.74193548 0.7130241 0.80813523 0.81325006 0.50860215 0.77072165] mean value: 0.6922836529141403 key: train_mcc value: [0.71239616 0.71972253 0.72313855 0.6419512 0.73033396 0.70874774 0.69849277 0.6908084 0.72712387 0.72023891] mean value: 0.7072954079422489 key: test_accuracy value: [0.83870968 0.82258065 0.85483871 0.75806452 0.87096774 0.85483871 0.90322581 0.90322581 0.75409836 0.8852459 ] mean value: 0.8445795875198308 key: train_accuracy value: [0.85611511 0.85971223 0.86151079 0.82014388 0.86510791 0.85431655 0.84892086 0.84532374 0.86355476 0.85996409] mean value: 0.8534669930124124 key: test_fscore value: [0.84375 0.82539683 0.86153846 0.72727273 0.87096774 0.86153846 0.90625 0.90909091 0.75409836 0.88888889] mean value: 0.8448792376317495 key: train_fscore value: [0.85765125 0.86170213 0.8627451 0.81343284 0.86631016 0.85561497 0.85211268 0.84697509 0.86428571 0.86170213] mean value: 0.8542532047730724 key: test_precision value: [0.81818182 0.8125 0.82352941 0.83333333 0.87096774 0.82352941 0.87878788 0.85714286 0.74193548 0.875 ] mean value: 0.833490793678175 key: train_precision value: [0.84859155 0.84965035 0.85512367 0.84496124 0.85865724 0.84805654 0.83448276 0.83802817 0.86120996 0.84965035] mean value: 0.8488411836784526 key: test_recall value: [0.87096774 0.83870968 0.90322581 0.64516129 0.87096774 0.90322581 0.93548387 0.96774194 0.76666667 0.90322581] mean value: 0.8605376344086022 key: train_recall value: [0.86690647 0.87410072 0.8705036 0.78417266 0.87410072 0.86330935 0.8705036 0.85611511 0.86738351 0.87410072] mean value: 0.8601196462185091 key: test_roc_auc value: [0.83870968 0.82258065 0.85483871 0.75806452 0.87096774 0.85483871 0.90322581 0.90322581 0.75430108 0.88494624] mean value: 0.8445698924731183 key: train_roc_auc value: [0.85611511 0.85971223 0.86151079 0.82014388 0.86510791 0.85431655 0.84892086 0.84532374 0.86354787 0.85998943] mean value: 0.8534688378329595 key: test_jcc value: [0.72972973 0.7027027 0.75675676 0.57142857 0.77142857 0.75675676 0.82857143 0.83333333 0.60526316 0.8 ] mean value: 0.7355971008602588 key: train_jcc value: [0.75077882 0.75700935 0.75862069 0.68553459 0.76415094 0.74766355 0.74233129 0.7345679 0.76100629 0.75700935] mean value: 0.7458672762322701 MCC on Blind test: 0.21 Accuracy on Blind test: 0.57 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00830388 0.00819135 0.00802517 0.00794339 0.00828314 0.00875449 0.00812817 0.0083952 0.0085578 0.00874829] mean value: 0.008333086967468262 key: score_time value: [0.00846505 0.00840521 0.008214 0.00820637 0.00816274 0.00835466 0.00821352 0.00871086 0.0093646 0.00880384] mean value: 0.00849008560180664 key: test_mcc value: [0.51639778 0.56761348 0.61290323 0.65372045 0.74348441 0.5809475 0.58834841 0.7130241 0.58264312 0.54086022] mean value: 0.6099942679846233 key: train_mcc value: [0.62249953 0.6079176 0.63414469 0.60794907 0.59713776 0.61543051 0.64482423 0.62249953 0.6375268 0.6122178 ] mean value: 0.620214750789007 key: test_accuracy value: [0.75806452 0.77419355 0.80645161 0.82258065 0.87096774 0.79032258 0.79032258 0.85483871 0.78688525 0.7704918 ] mean value: 0.8025118984664199 key: train_accuracy value: [0.81115108 0.80395683 0.81654676 0.80395683 0.79856115 0.80755396 0.82194245 0.81115108 0.81867145 0.80610413] mean value: 0.8099595727367837 key: test_fscore value: [0.75409836 0.74074074 0.80645161 0.80701754 0.875 0.79365079 0.80597015 0.86153846 0.8 0.77419355] mean value: 0.8018661210989436 key: train_fscore value: [0.80874317 0.8036036 0.82167832 0.80500894 0.79928315 0.80438757 0.82661996 0.81349911 0.82123894 0.80505415] mean value: 0.8109116928454192 key: test_precision value: [0.76666667 0.86956522 0.80645161 0.88461538 0.84848485 0.78125 0.75 0.82352941 0.74285714 0.77419355] mean value: 0.8047613833070375 key: train_precision value: [0.81918819 0.80505415 0.79931973 0.80071174 0.79642857 0.81784387 0.80546075 0.80350877 0.81118881 0.80797101] mean value: 0.8066675601234072 key: test_recall value: [0.74193548 0.64516129 0.80645161 0.74193548 0.90322581 0.80645161 0.87096774 0.90322581 0.86666667 0.77419355] mean value: 0.8060215053763441 key: train_recall value: [0.79856115 0.80215827 0.84532374 0.80935252 0.80215827 0.79136691 0.84892086 0.82374101 0.83154122 0.80215827] mean value: 0.8155282225832238 key: test_roc_auc value: [0.75806452 0.77419355 0.80645161 0.82258065 0.87096774 0.79032258 0.79032258 0.85483871 0.78817204 0.77043011] mean value: 0.8026344086021505 key: train_roc_auc value: [0.81115108 0.80395683 0.81654676 0.80395683 0.79856115 0.80755396 0.82194245 0.81115108 0.81864831 0.80609706] mean value: 0.8099565508883215 key: test_jcc value: [0.60526316 0.58823529 0.67567568 0.67647059 0.77777778 0.65789474 0.675 0.75675676 0.66666667 0.63157895] mean value: 0.6711319601335082 key: train_jcc value: [0.67889908 0.67168675 0.69732938 0.67365269 0.66567164 0.67278287 0.70447761 0.68562874 0.6966967 0.67371601] mean value: 0.6820541480667476 MCC on Blind test: 0.18 Accuracy on Blind test: 0.52 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00809216 0.00795221 0.00791621 0.00790453 0.00721383 0.00737166 0.00789046 0.00771284 0.00766468 0.00788069] mean value: 0.0077599287033081055 key: score_time value: [0.01314855 0.01184964 0.0114398 0.0146842 0.01099205 0.01099849 0.01176476 0.0116837 0.01157475 0.01158309] mean value: 0.01197190284729004 key: test_mcc value: [0.45760432 0.48488114 0.67883359 0.55301004 0.67883359 0.67883359 0.54953196 0.74348441 0.40967742 0.70780713] mean value: 0.5942497191157756 key: train_mcc value: [0.7125253 0.73779681 0.71605437 0.74499483 0.7125253 0.71313508 0.726788 0.73745301 0.75237261 0.72554668] mean value: 0.7279191995608599 key: test_accuracy value: [0.72580645 0.74193548 0.83870968 0.77419355 0.83870968 0.83870968 0.77419355 0.87096774 0.70491803 0.85245902] mean value: 0.7960602855631941 key: train_accuracy value: [0.85611511 0.86870504 0.85791367 0.87230216 0.85611511 0.85611511 0.86330935 0.86870504 0.87612208 0.86175943] mean value: 0.8637162083618563 key: test_fscore value: [0.70175439 0.75 0.83333333 0.75862069 0.83333333 0.84375 0.78125 0.86666667 0.7 0.86153846] mean value: 0.7930246870491879 key: train_fscore value: [0.8540146 0.86654479 0.856102 0.8702011 0.8540146 0.85239852 0.86181818 0.86799277 0.87522604 0.85607477] mean value: 0.8614387366046266 key: test_precision value: [0.76923077 0.72727273 0.86206897 0.81481481 0.86206897 0.81818182 0.75757576 0.89655172 0.7 0.82352941] mean value: 0.8031294954013006 key: train_precision value: [0.86666667 0.88104089 0.86715867 0.88475836 0.86666667 0.875 0.87132353 0.87272727 0.88321168 0.89105058] mean value: 0.8759604326054368 key: test_recall value: [0.64516129 0.77419355 0.80645161 0.70967742 0.80645161 0.87096774 0.80645161 0.83870968 0.7 0.90322581] mean value: 0.7861290322580645 key: train_recall value: [0.84172662 0.85251799 0.84532374 0.85611511 0.84172662 0.83093525 0.85251799 0.86330935 0.86738351 0.82374101] mean value: 0.8475297181609551 key: test_roc_auc value: [0.72580645 0.74193548 0.83870968 0.77419355 0.83870968 0.83870968 0.77419355 0.87096774 0.70483871 0.8516129 ] mean value: 0.7959677419354838 key: train_roc_auc value: [0.85611511 0.86870504 0.85791367 0.87230216 0.85611511 0.85611511 0.86330935 0.86870504 0.8761378 0.86169129] mean value: 0.8637109667105026 key: test_jcc value: [0.54054054 0.6 0.71428571 0.61111111 0.71428571 0.72972973 0.64102564 0.76470588 0.53846154 0.75675676] mean value: 0.6610902628549687 key: train_jcc value: [0.74522293 0.76451613 0.74840764 0.77022654 0.74522293 0.74276527 0.7571885 0.76677316 0.77813505 0.74836601] mean value: 0.7566824165390956 MCC on Blind test: 0.16 Accuracy on Blind test: 0.57 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.01524782 0.01526904 0.01670051 0.01508474 0.01485252 0.01501393 0.01506996 0.01522017 0.01478338 0.01487541] mean value: 0.015211749076843261 key: score_time value: [0.00945497 0.00925422 0.00928378 0.00928211 0.00977159 0.00912595 0.00927424 0.00921845 0.00913382 0.00917101] mean value: 0.009297013282775879 key: test_mcc value: [0.64820372 0.75623534 0.80813523 0.71004695 0.74819006 0.7284928 0.7190925 0.70116959 0.61256703 0.6844511 ] mean value: 0.7116584311085777 key: train_mcc value: [0.78485761 0.79151169 0.79209132 0.85451608 0.77632088 0.78285538 0.75529076 0.75529076 0.78851732 0.80529218] mean value: 0.7886543984062245 key: test_accuracy value: [0.82258065 0.87096774 0.90322581 0.85483871 0.87096774 0.85483871 0.85483871 0.83870968 0.80327869 0.83606557] mean value: 0.8510312004230566 key: train_accuracy value: [0.89028777 0.89388489 0.89388489 0.92625899 0.88489209 0.88848921 0.87410072 0.87410072 0.89228007 0.90125673] mean value: 0.8919436084884337 key: test_fscore value: [0.83076923 0.88235294 0.90625 0.85245902 0.87878788 0.86956522 0.86567164 0.85714286 0.8125 0.85294118] mean value: 0.8608439959922817 key: train_fscore value: [0.8957265 0.89879931 0.8991453 0.92869565 0.89189189 0.89491525 0.88215488 0.88215488 0.89761092 0.90500864] mean value: 0.8976103228458596 key: test_precision value: [0.79411765 0.81081081 0.87878788 0.86666667 0.82857143 0.78947368 0.80555556 0.76923077 0.76470588 0.78378378] mean value: 0.8091704107029185 key: train_precision value: [0.8534202 0.85901639 0.85667752 0.8989899 0.84076433 0.84615385 0.82911392 0.82911392 0.85667752 0.87043189] mean value: 0.8540359455885207 key: test_recall value: [0.87096774 0.96774194 0.93548387 0.83870968 0.93548387 0.96774194 0.93548387 0.96774194 0.86666667 0.93548387] mean value: 0.9221505376344086 key: train_recall value: [0.94244604 0.94244604 0.94604317 0.96043165 0.94964029 0.94964029 0.94244604 0.94244604 0.94265233 0.94244604] mean value: 0.9460637941259895 key: test_roc_auc value: [0.82258065 0.87096774 0.90322581 0.85483871 0.87096774 0.85483871 0.85483871 0.83870968 0.80430108 0.8344086 ] mean value: 0.8509677419354839 key: train_roc_auc value: [0.89028777 0.89388489 0.89388489 0.92625899 0.88489209 0.88848921 0.87410072 0.87410072 0.89218947 0.90133055] mean value: 0.8919419303267063 key: test_jcc value: [0.71052632 0.78947368 0.82857143 0.74285714 0.78378378 0.76923077 0.76315789 0.75 0.68421053 0.74358974] mean value: 0.7565401289085499 key: train_jcc value: [0.81114551 0.81619938 0.81677019 0.86688312 0.80487805 0.80981595 0.78915663 0.78915663 0.81424149 0.82649842] mean value: 0.8144745352495302 MCC on Blind test: 0.26 Accuracy on Blind test: 0.5 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [1.6481297 1.49946976 1.67077136 1.65997696 1.63008213 1.52724123 1.67578554 1.65596056 1.49459696 1.68944907] mean value: 1.6151463270187378 key: score_time value: [0.01430917 0.01388526 0.01319432 0.01351166 0.01167202 0.01358342 0.01357841 0.01354527 0.01401711 0.01371384] mean value: 0.01350104808807373 key: test_mcc value: [0.96824584 0.96824584 0.93548387 0.7190925 0.90369611 0.93743687 1. 1. 0.83655914 1. ] mean value: 0.9268760160039228 key: train_mcc value: [0.99280576 0.99283145 0.99640932 1. 0.99283145 0.99283145 0.99283145 0.99283145 0.99284434 0.98923428] mean value: 0.9935450945650737 key: test_accuracy value: [0.98387097 0.98387097 0.96774194 0.85483871 0.9516129 0.96774194 1. 1. 0.91803279 1. ] mean value: 0.9627710206240084 key: train_accuracy value: [0.99640288 0.99640288 0.99820144 1. 0.99640288 0.99640288 0.99640288 0.99640288 0.99640934 0.994614 ] mean value: 0.9967642044353745 key: test_fscore value: [0.98360656 0.98360656 0.96774194 0.84210526 0.95081967 0.96875 1. 1. 0.91803279 1. ] mean value: 0.9614662772412258 key: train_fscore value: [0.99640288 0.99638989 0.9981982 1. 0.99638989 0.99638989 0.99638989 0.99638989 0.99640288 0.99459459] mean value: 0.9967548006672231 key: test_precision value: [1. 1. 0.96774194 0.92307692 0.96666667 0.93939394 1. 1. 0.90322581 1. ] mean value: 0.9700105271073013 key: train_precision value: [0.99640288 1. 1. 1. 1. 1. 1. 1. 1. 0.99638989] mean value: 0.9992792769394593 key: test_recall value: [0.96774194 0.96774194 0.96774194 0.77419355 0.93548387 1. 1. 1. 0.93333333 1. ] mean value: 0.9546236559139785 key: train_recall value: [0.99640288 0.99280576 0.99640288 1. 0.99280576 0.99280576 0.99280576 0.99280576 0.99283154 0.99280576] mean value: 0.9942471828988423 key: test_roc_auc value: [0.98387097 0.98387097 0.96774194 0.85483871 0.9516129 0.96774194 1. 1. 0.91827957 1. ] mean value: 0.9627956989247312 key: train_roc_auc value: [0.99640288 0.99640288 0.99820144 1. 0.99640288 0.99640288 0.99640288 0.99640288 0.99641577 0.99461076] mean value: 0.9967645238647792 key: test_jcc value: [0.96774194 0.96774194 0.9375 0.72727273 0.90625 0.93939394 1. 1. 0.84848485 1. ] mean value: 0.9294385386119257 key: train_jcc value: [0.99283154 0.99280576 0.99640288 1. 0.99280576 0.99280576 0.99280576 0.99280576 0.99283154 0.98924731] mean value: 0.9935342048941492 MCC on Blind test: 0.09 Accuracy on Blind test: 0.24 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01340103 0.01229239 0.00977087 0.00982738 0.00973988 0.01050377 0.00967884 0.01013613 0.01014376 0.01027131] mean value: 0.010576534271240234 key: score_time value: [0.01074123 0.00902033 0.00799775 0.00793815 0.00800681 0.00789976 0.00837636 0.00792694 0.00824547 0.00833321] mean value: 0.008448600769042969 key: test_mcc value: [0.90748521 0.96824584 0.96824584 1. 0.93743687 0.93548387 0.93743687 0.93743687 0.9344086 0.96774194] mean value: 0.9493921894362165 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9516129 0.98387097 0.98387097 1. 0.96774194 0.96774194 0.96774194 0.96774194 0.96721311 0.98360656] mean value: 0.9741142252776309 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94915254 0.98360656 0.98412698 1. 0.96666667 0.96774194 0.96666667 0.96666667 0.96666667 0.98360656] mean value: 0.9734901243404501 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.96875 1. 1. 0.96774194 1. 1. 0.96666667 1. ] mean value: 0.9903158602150538 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.90322581 0.96774194 1. 1. 0.93548387 0.96774194 0.93548387 0.93548387 0.96666667 0.96774194] mean value: 0.9579569892473119 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9516129 0.98387097 0.98387097 1. 0.96774194 0.96774194 0.96774194 0.96774194 0.9672043 0.98387097] mean value: 0.9741397849462365 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.90322581 0.96774194 0.96875 1. 0.93548387 0.9375 0.93548387 0.93548387 0.93548387 0.96774194] mean value: 0.9486895161290323 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.01 Accuracy on Blind test: 0.2 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.10707998 0.10903525 0.10817385 0.10511184 0.10628986 0.10499215 0.10362315 0.10446763 0.10430741 0.10113478] mean value: 0.10542159080505371 key: score_time value: [0.01860476 0.01862955 0.01860476 0.01870513 0.01833129 0.01816988 0.01843429 0.01767302 0.01715016 0.01741219] mean value: 0.01817150115966797 key: test_mcc value: [0.93548387 1. 0.93548387 0.87831007 0.90369611 0.93743687 1. 0.96824584 0.90215054 0.93635873] mean value: 0.9397165895399419 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.96774194 1. 0.96774194 0.93548387 0.9516129 0.96774194 1. 0.98387097 0.95081967 0.96721311] mean value: 0.9692226335272343 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.96774194 1. 0.96774194 0.93103448 0.95081967 0.96875 1. 0.98412698 0.95081967 0.96875 ] mean value: 0.9689784682115642 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96774194 1. 0.96774194 1. 0.96666667 0.93939394 1. 0.96875 0.93548387 0.93939394] mean value: 0.9685172287390029 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96774194 1. 0.96774194 0.87096774 0.93548387 1. 1. 1. 0.96666667 1. ] mean value: 0.9708602150537634 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.96774194 1. 0.96774194 0.93548387 0.9516129 0.96774194 1. 0.98387097 0.95107527 0.96666667] mean value: 0.9691935483870968 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.9375 1. 0.9375 0.87096774 0.90625 0.93939394 1. 0.96875 0.90625 0.93939394] mean value: 0.9406005620723363 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.2 Accuracy on Blind test: 0.36 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00863886 0.00797391 0.0083859 0.00775075 0.00766373 0.0079093 0.00830865 0.00843334 0.00793123 0.00765133] mean value: 0.008064699172973634 key: score_time value: [0.00806904 0.00858569 0.00859904 0.00799298 0.00799918 0.00797725 0.00856709 0.00818801 0.00789118 0.00795794] mean value: 0.008182740211486817 key: test_mcc value: [0.75623534 0.87831007 0.87278605 0.83914639 0.84266484 0.64820372 0.74348441 0.90748521 0.77072165 0.83655914] mean value: 0.8095596827565272 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.87096774 0.93548387 0.93548387 0.91935484 0.91935484 0.82258065 0.87096774 0.9516129 0.8852459 0.91803279] mean value: 0.9029085140137494 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 0.93103448 0.93333333 0.92063492 0.91525424 0.81355932 0.86666667 0.94915254 0.88135593 0.91803279] mean value: 0.8986167081319949 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.96 1. 0.96551724 0.90625 0.96428571 0.85714286 0.89655172 1. 0.89655172 0.93333333] mean value: 0.9379632594417078 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.77419355 0.87096774 0.90322581 0.93548387 0.87096774 0.77419355 0.83870968 0.90322581 0.86666667 0.90322581] mean value: 0.8640860215053763 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.87096774 0.93548387 0.93548387 0.91935484 0.91935484 0.82258065 0.87096774 0.9516129 0.88494624 0.91827957] mean value: 0.9029032258064517 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 0.87096774 0.875 0.85294118 0.84375 0.68571429 0.76470588 0.90322581 0.78787879 0.84848485] mean value: 0.8182668529288548 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.1 Accuracy on Blind test: 0.26 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.34418821 1.34185529 1.3479538 1.36781883 1.42743945 1.3655982 1.38340139 1.37809682 1.39602447 1.33490944] mean value: 1.3687285900115966 key: score_time value: [0.09742594 0.09719825 0.09524751 0.09951448 0.09094286 0.0994525 0.09763288 0.09727025 0.09892535 0.09526753] mean value: 0.09688775539398194 key: test_mcc value: [0.96824584 0.96824584 0.93548387 0.96824584 0.96824584 0.96824584 1. 1. 0.90215054 1. ] mean value: 0.9678863591361422 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98387097 0.98387097 0.96774194 0.98387097 0.98387097 0.98387097 1. 1. 0.95081967 1. ] mean value: 0.9837916446324696 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98360656 0.98360656 0.96774194 0.98360656 0.98412698 0.98412698 1. 1. 0.95081967 1. ] mean value: 0.9837635248000134 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.96774194 1. 0.96875 0.96875 1. 1. 0.93548387 1. ] mean value: 0.9840725806451613 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96774194 0.96774194 0.96774194 0.96774194 1. 1. 1. 1. 0.96666667 1. ] mean value: 0.983763440860215 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98387097 0.98387097 0.96774194 0.98387097 0.98387097 0.98387097 1. 1. 0.95107527 1. ] mean value: 0.9838172043010753 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96774194 0.96774194 0.9375 0.96774194 0.96875 0.96875 1. 1. 0.90625 1. ] mean value: 0.9684475806451613 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.19 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.99160314 0.89481354 0.90568662 0.9475925 0.90248585 0.94201088 0.93975306 0.95513034 0.88768649 0.92357564] mean value: 0.9290338039398194 key: score_time value: [0.15050101 0.24627447 0.24356008 0.27248359 0.27095199 0.25157189 0.20301151 0.27629042 0.26423383 0.23688626] mean value: 0.24157650470733644 key: test_mcc value: [0.93548387 0.96824584 0.93548387 0.96824584 0.90748521 0.96824584 1. 0.96824584 0.87082935 0.96770777] mean value: 0.9489973426546622 key: train_mcc value: [0.96425338 0.96058703 0.96425338 0.96058703 0.96412858 0.97132357 0.95353974 0.96412858 0.96783888 0.96065866] mean value: 0.9631298857914714 key: test_accuracy value: [0.96774194 0.98387097 0.96774194 0.98387097 0.9516129 0.98387097 1. 0.98387097 0.93442623 0.98360656] mean value: 0.9740613432046537 key: train_accuracy value: [0.98201439 0.98021583 0.98201439 0.98021583 0.98201439 0.98561151 0.97661871 0.98201439 0.98384201 0.98025135] mean value: 0.9814812781731527 key: test_fscore value: [0.96774194 0.98360656 0.96774194 0.98360656 0.95384615 0.98412698 1. 0.98360656 0.93548387 0.98412698] mean value: 0.9743887536166753 key: train_fscore value: [0.98220641 0.98039216 0.98220641 0.98039216 0.98214286 0.98571429 0.97690941 0.98214286 0.98401421 0.98039216] mean value: 0.9816512905421962 key: test_precision value: [0.96774194 1. 0.96774194 1. 0.91176471 0.96875 1. 1. 0.90625 0.96875 ] mean value: 0.9690998576850095 key: train_precision value: [0.97183099 0.97173145 0.97183099 0.97173145 0.9751773 0.9787234 0.96491228 0.9751773 0.97535211 0.97173145] mean value: 0.9728198725682946 key: test_recall value: [0.96774194 0.96774194 0.96774194 0.96774194 1. 1. 1. 0.96774194 0.96666667 1. ] mean value: 0.9805376344086022 key: train_recall value: [0.99280576 0.98920863 0.99280576 0.98920863 0.98920863 0.99280576 0.98920863 0.98920863 0.99283154 0.98920863] mean value: 0.9906500605966839 key: test_roc_auc value: [0.96774194 0.98387097 0.96774194 0.98387097 0.9516129 0.98387097 1. 0.98387097 0.93494624 0.98333333] mean value: 0.9740860215053764 key: train_roc_auc value: [0.98201439 0.98021583 0.98201439 0.98021583 0.98201439 0.98561151 0.97661871 0.98201439 0.98382584 0.9802674 ] mean value: 0.9814812665996235 key: test_jcc value: [0.9375 0.96774194 0.9375 0.96774194 0.91176471 0.96875 1. 0.96774194 0.87878788 0.96875 ] mean value: 0.9506278391121845 key: train_jcc value: [0.96503497 0.96153846 0.96503497 0.96153846 0.96491228 0.97183099 0.95486111 0.96491228 0.96853147 0.96153846] mean value: 0.9639733441646896 MCC on Blind test: 0.1 Accuracy on Blind test: 0.23 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01971221 0.00761104 0.00768089 0.00756931 0.00756836 0.00765538 0.00759244 0.00763845 0.00757504 0.00766015] mean value: 0.008826327323913575 key: score_time value: [0.01263118 0.00788474 0.00787878 0.00782609 0.00785947 0.00789833 0.00783944 0.00784731 0.00786543 0.00787163] mean value: 0.008340239524841309 key: test_mcc value: [0.51639778 0.56761348 0.61290323 0.65372045 0.74348441 0.5809475 0.58834841 0.7130241 0.58264312 0.54086022] mean value: 0.6099942679846233 key: train_mcc value: [0.62249953 0.6079176 0.63414469 0.60794907 0.59713776 0.61543051 0.64482423 0.62249953 0.6375268 0.6122178 ] mean value: 0.620214750789007 key: test_accuracy value: [0.75806452 0.77419355 0.80645161 0.82258065 0.87096774 0.79032258 0.79032258 0.85483871 0.78688525 0.7704918 ] mean value: 0.8025118984664199 key: train_accuracy value: [0.81115108 0.80395683 0.81654676 0.80395683 0.79856115 0.80755396 0.82194245 0.81115108 0.81867145 0.80610413] mean value: 0.8099595727367837 key: test_fscore value: [0.75409836 0.74074074 0.80645161 0.80701754 0.875 0.79365079 0.80597015 0.86153846 0.8 0.77419355] mean value: 0.8018661210989436 key: train_fscore value: [0.80874317 0.8036036 0.82167832 0.80500894 0.79928315 0.80438757 0.82661996 0.81349911 0.82123894 0.80505415] mean value: 0.8109116928454192 key: test_precision value: [0.76666667 0.86956522 0.80645161 0.88461538 0.84848485 0.78125 0.75 0.82352941 0.74285714 0.77419355] mean value: 0.8047613833070375 key: train_precision value: [0.81918819 0.80505415 0.79931973 0.80071174 0.79642857 0.81784387 0.80546075 0.80350877 0.81118881 0.80797101] mean value: 0.8066675601234072 key: test_recall value: [0.74193548 0.64516129 0.80645161 0.74193548 0.90322581 0.80645161 0.87096774 0.90322581 0.86666667 0.77419355] mean value: 0.8060215053763441 key: train_recall value: [0.79856115 0.80215827 0.84532374 0.80935252 0.80215827 0.79136691 0.84892086 0.82374101 0.83154122 0.80215827] mean value: 0.8155282225832238 key: test_roc_auc value: [0.75806452 0.77419355 0.80645161 0.82258065 0.87096774 0.79032258 0.79032258 0.85483871 0.78817204 0.77043011] mean value: 0.8026344086021505 key: train_roc_auc value: [0.81115108 0.80395683 0.81654676 0.80395683 0.79856115 0.80755396 0.82194245 0.81115108 0.81864831 0.80609706] mean value: 0.8099565508883215 key: test_jcc value: [0.60526316 0.58823529 0.67567568 0.67647059 0.77777778 0.65789474 0.675 0.75675676 0.66666667 0.63157895] mean value: 0.6711319601335082 key: train_jcc value: [0.67889908 0.67168675 0.69732938 0.67365269 0.66567164 0.67278287 0.70447761 0.68562874 0.6966967 0.67371601] mean value: 0.6820541480667476 MCC on Blind test: 0.18 Accuracy on Blind test: 0.52 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.21862555 0.04956889 0.04996634 0.05186462 0.05506182 0.06219912 0.06107974 0.06241131 0.05737829 0.05969238] mean value: 0.07278480529785156 key: score_time value: [0.01031947 0.00971913 0.00969386 0.00995827 0.01020288 0.00984311 0.0096755 0.00973344 0.0099237 0.00953674] mean value: 0.009860610961914063 key: test_mcc value: [0.96824584 0.96824584 0.93548387 0.96824584 0.96824584 0.96824584 0.96824584 0.96824584 0.90215054 1. ] mean value: 0.9615355264465131 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98387097 0.98387097 0.96774194 0.98387097 0.98387097 0.98387097 0.98387097 0.98387097 0.95081967 1. ] mean value: 0.9805658381808567 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98360656 0.98360656 0.96774194 0.98360656 0.98412698 0.98412698 0.98360656 0.98360656 0.95081967 1. ] mean value: 0.9804848362754233 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.96774194 1. 0.96875 0.96875 1. 1. 0.93548387 1. ] mean value: 0.9840725806451613 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96774194 0.96774194 0.96774194 0.96774194 1. 1. 0.96774194 0.96774194 0.96666667 1. ] mean value: 0.9773118279569892 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98387097 0.98387097 0.96774194 0.98387097 0.98387097 0.98387097 0.98387097 0.98387097 0.95107527 1. ] mean value: 0.9805913978494624 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96774194 0.96774194 0.9375 0.96774194 0.96875 0.96875 0.96774194 0.96774194 0.90625 1. ] mean value: 0.9619959677419355 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.06 Accuracy on Blind test: 0.2 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.01578832 0.04168701 0.05872059 0.01797581 0.01809096 0.03955126 0.04262829 0.01832151 0.01880884 0.0180583 ] mean value: 0.028963088989257812 key: score_time value: [0.01038313 0.01973009 0.01196766 0.01065159 0.01061916 0.02021313 0.02139711 0.01056767 0.01115489 0.01077628] mean value: 0.013746070861816406 key: test_mcc value: [0.93548387 1. 0.93548387 0.87831007 0.87831007 0.96824584 0.93743687 0.96824584 0.83655914 0.93635873] mean value: 0.9274434285640426 key: train_mcc value: [0.94283651 0.9393413 0.94305636 0.93563929 0.95353974 0.9393413 0.93914669 0.93214329 0.94994909 0.93925798] mean value: 0.941425155755879 key: test_accuracy value: [0.96774194 1. 0.96774194 0.93548387 0.93548387 0.98387097 0.96774194 0.98387097 0.91803279 0.96721311] mean value: 0.9627181385510312 key: train_accuracy value: [0.97122302 0.96942446 0.97122302 0.9676259 0.97661871 0.96942446 0.96942446 0.96582734 0.97486535 0.96947935] mean value: 0.9705136070676672 key: test_fscore value: [0.96774194 1. 0.96774194 0.93103448 0.93939394 0.98412698 0.96875 0.98360656 0.91803279 0.96875 ] mean value: 0.9629178621509581 key: train_fscore value: [0.97163121 0.9699115 0.97173145 0.96808511 0.97690941 0.9699115 0.96980462 0.96637168 0.9751773 0.96980462] mean value: 0.9709338406138824 key: test_precision value: [0.96774194 1. 0.96774194 1. 0.88571429 0.96875 0.93939394 1. 0.90322581 0.93939394] mean value: 0.9571961841921519 key: train_precision value: [0.95804196 0.95470383 0.95486111 0.95454545 0.96491228 0.95470383 0.95789474 0.95121951 0.96491228 0.95789474] mean value: 0.9573689736486591 key: test_recall value: [0.96774194 1. 0.96774194 0.87096774 1. 1. 1. 0.96774194 0.93333333 1. ] mean value: 0.970752688172043 key: train_recall value: [0.98561151 0.98561151 0.98920863 0.98201439 0.98920863 0.98561151 0.98201439 0.98201439 0.98566308 0.98201439] mean value: 0.9848972434955261 key: test_roc_auc value: [0.96774194 1. 0.96774194 0.93548387 0.93548387 0.98387097 0.96774194 0.98387097 0.91827957 0.96666667] mean value: 0.9626881720430107 key: train_roc_auc value: [0.97122302 0.96942446 0.97122302 0.9676259 0.97661871 0.96942446 0.96942446 0.96582734 0.97484593 0.96950182] mean value: 0.970513911451484 key: test_jcc value: [0.9375 1. 0.9375 0.87096774 0.88571429 0.96875 0.93939394 0.96774194 0.84848485 0.93939394] mean value: 0.9295446690406368 key: train_jcc value: [0.94482759 0.94158076 0.94501718 0.93814433 0.95486111 0.94158076 0.94137931 0.93493151 0.95155709 0.94137931] mean value: 0.9435258942337567 MCC on Blind test: 0.14 Accuracy on Blind test: 0.35 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.01899743 0.00779724 0.00781965 0.00752091 0.00765944 0.00745153 0.00752687 0.00762939 0.00754023 0.00756288] mean value: 0.008750557899475098 key: score_time value: [0.008394 0.00820088 0.00782681 0.00797677 0.00793958 0.00783062 0.00781059 0.0078342 0.00790739 0.00793242] mean value: 0.007965326309204102 key: test_mcc value: [0.61807005 0.74819006 0.67883359 0.64549722 0.67883359 0.63439154 0.63439154 0.67419986 0.54654832 0.64708149] mean value: 0.6506037256013296 key: train_mcc value: [0.66814183 0.65361701 0.66955589 0.67282515 0.64923736 0.67144111 0.67540424 0.6622781 0.67590132 0.66881107] mean value: 0.6667213081476084 key: test_accuracy value: [0.80645161 0.87096774 0.83870968 0.82258065 0.83870968 0.80645161 0.80645161 0.82258065 0.7704918 0.81967213] mean value: 0.8203067160232681 key: train_accuracy value: [0.83093525 0.82374101 0.83093525 0.83273381 0.82014388 0.83273381 0.83453237 0.82733813 0.83482944 0.83123878] mean value: 0.8299161747801042 key: test_fscore value: [0.81818182 0.87878788 0.84375 0.82539683 0.84375 0.82857143 0.82857143 0.84507042 0.78125 0.8358209 ] mean value: 0.8329150697566978 key: train_fscore value: [0.84175084 0.83501684 0.84280936 0.84422111 0.83388704 0.84317032 0.84511785 0.83946488 0.84563758 0.84175084] mean value: 0.8412826664142349 key: test_precision value: [0.77142857 0.82857143 0.81818182 0.8125 0.81818182 0.74358974 0.74358974 0.75 0.73529412 0.77777778] mean value: 0.779911501896796 key: train_precision value: [0.79113924 0.78481013 0.7875 0.78996865 0.77469136 0.79365079 0.7943038 0.784375 0.79495268 0.79113924] mean value: 0.7886530890164406 key: test_recall value: [0.87096774 0.93548387 0.87096774 0.83870968 0.87096774 0.93548387 0.93548387 0.96774194 0.83333333 0.90322581] mean value: 0.896236559139785 key: train_recall value: [0.89928058 0.89208633 0.90647482 0.90647482 0.9028777 0.89928058 0.9028777 0.9028777 0.90322581 0.89928058] mean value: 0.9014736597818519 key: test_roc_auc value: [0.80645161 0.87096774 0.83870968 0.82258065 0.83870968 0.80645161 0.80645161 0.82258065 0.77150538 0.81827957] mean value: 0.8202688172043011 key: train_roc_auc value: [0.83093525 0.82374101 0.83093525 0.83273381 0.82014388 0.83273381 0.83453237 0.82733813 0.83470643 0.83136072] mean value: 0.8299160671462831 key: test_jcc value: [0.69230769 0.78378378 0.72972973 0.7027027 0.72972973 0.70731707 0.70731707 0.73170732 0.64102564 0.71794872] mean value: 0.7143569460642631 key: train_jcc value: [0.72674419 0.71676301 0.7283237 0.73043478 0.71509972 0.72886297 0.73177843 0.72334294 0.73255814 0.72674419] mean value: 0.7260652053436807 MCC on Blind test: 0.21 Accuracy on Blind test: 0.5 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01070261 0.0129571 0.01364112 0.01318789 0.01269341 0.01534224 0.01468229 0.01412392 0.01440811 0.0143919 ] mean value: 0.013613057136535645 key: score_time value: [0.008075 0.01009893 0.00991964 0.01034665 0.01041341 0.01067472 0.0105691 0.01087594 0.01076126 0.01034617] mean value: 0.01020808219909668 key: test_mcc value: [0.82199494 0.93743687 0.93548387 0.81325006 0.87831007 0.74161985 0.90748521 0.83914639 0.72318666 0.30374645] mean value: 0.7901660359762814 key: train_mcc value: [0.87166214 0.92172241 0.94266562 0.92172241 0.91860435 0.69376766 0.94305636 0.93238486 0.88634645 0.2887174 ] mean value: 0.8320649673376139 key: test_accuracy value: [0.90322581 0.96774194 0.96774194 0.90322581 0.93548387 0.85483871 0.9516129 0.91935484 0.85245902 0.59016393] mean value: 0.8845848757271285 key: train_accuracy value: [0.93345324 0.96043165 0.97122302 0.96043165 0.95863309 0.82733813 0.97122302 0.96582734 0.94075404 0.57630162] mean value: 0.9065616806375366 key: test_fscore value: [0.89285714 0.96875 0.96774194 0.89655172 0.93939394 0.87323944 0.95384615 0.92063492 0.83018868 0.71264368] mean value: 0.895584761037988 key: train_fscore value: [0.92979127 0.96126761 0.97153025 0.96126761 0.95971979 0.85185185 0.97173145 0.9664903 0.93761815 0.7020202 ] mean value: 0.9213288471474509 key: test_precision value: [1. 0.93939394 0.96774194 0.96296296 0.88571429 0.775 0.91176471 0.90625 0.95652174 0.55357143] mean value: 0.8858920997139276 key: train_precision value: [0.98393574 0.94137931 0.96126761 0.94137931 0.93515358 0.74594595 0.95486111 0.94809689 0.992 0.54085603] mean value: 0.8944875526911704 key: test_recall value: [0.80645161 1. 0.96774194 0.83870968 1. 1. 1. 0.93548387 0.73333333 1. ] mean value: 0.9281720430107527 key: train_recall value: [0.88129496 0.98201439 0.98201439 0.98201439 0.98561151 0.99280576 0.98920863 0.98561151 0.88888889 1. ] mean value: 0.9669464428457234 key: test_roc_auc value: [0.90322581 0.96774194 0.96774194 0.90322581 0.93548387 0.85483871 0.9516129 0.91935484 0.85053763 0.58333333] mean value: 0.8837096774193549 key: train_roc_auc value: [0.93345324 0.96043165 0.97122302 0.96043165 0.95863309 0.82733813 0.97122302 0.96582734 0.94084732 0.57706093] mean value: 0.9066469405121065 key: test_jcc value: [0.80645161 0.93939394 0.9375 0.8125 0.88571429 0.775 0.91176471 0.85294118 0.70967742 0.55357143] mean value: 0.818451456829066 key: train_jcc value: [0.86879433 0.92542373 0.94463668 0.92542373 0.92255892 0.74193548 0.94501718 0.93515358 0.88256228 0.54085603] mean value: 0.8632361942955643 MCC on Blind test: 0.1 Accuracy on Blind test: 0.29 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01686311 0.01279736 0.01273036 0.01324439 0.01300955 0.01324821 0.01373839 0.01282573 0.01237702 0.01400542] mean value: 0.013483953475952149 key: score_time value: [0.01079369 0.01044965 0.01073813 0.0103972 0.01031804 0.01030827 0.01035023 0.01034307 0.01035166 0.01030016] mean value: 0.010435009002685547 key: test_mcc value: [0.87831007 0.74161985 0.78446454 0.71567809 0.79471941 0.93548387 0.96824584 0.84983659 0.77072165 0.90586325] mean value: 0.8344943153997917 key: train_mcc value: [0.92518498 0.76865678 0.81406658 0.92923662 0.90265061 0.89965316 0.92844206 0.89154571 0.92828039 0.93998809] mean value: 0.8927704971400476 key: test_accuracy value: [0.93548387 0.85483871 0.88709677 0.83870968 0.88709677 0.96774194 0.98387097 0.91935484 0.8852459 0.95081967] mean value: 0.9110259122157589 key: train_accuracy value: [0.96223022 0.87230216 0.89928058 0.96402878 0.94964029 0.94964029 0.96402878 0.9442446 0.96409336 0.96947935] mean value: 0.9438968394404763 key: test_fscore value: [0.93103448 0.83018868 0.89552239 0.80769231 0.89855072 0.96774194 0.98360656 0.9122807 0.88135593 0.95384615] mean value: 0.9061819863058443 key: train_fscore value: [0.96146789 0.85420945 0.90819672 0.96309963 0.95172414 0.94890511 0.96350365 0.94183865 0.96441281 0.97012302] mean value: 0.9427481068247102 key: test_precision value: [1. 1. 0.83333333 1. 0.81578947 0.96774194 1. 1. 0.89655172 0.91176471] mean value: 0.9425181172521699 key: train_precision value: [0.98127341 0.99521531 0.83433735 0.98863636 0.91390728 0.96296296 0.97777778 0.98431373 0.95759717 0.94845361] mean value: 0.9544474964669887 key: test_recall value: [0.87096774 0.70967742 0.96774194 0.67741935 1. 0.96774194 0.96774194 0.83870968 0.86666667 1. ] mean value: 0.8866666666666667 key: train_recall value: [0.94244604 0.74820144 0.99640288 0.93884892 0.99280576 0.9352518 0.94964029 0.9028777 0.97132616 0.99280576] mean value: 0.937060674041412 key: test_roc_auc value: [0.93548387 0.85483871 0.88709677 0.83870968 0.88709677 0.96774194 0.98387097 0.91935484 0.88494624 0.95 ] mean value: 0.9109139784946236 key: train_roc_auc value: [0.96223022 0.87230216 0.89928058 0.96402878 0.94964029 0.94964029 0.96402878 0.9442446 0.96408035 0.96952116] mean value: 0.9438997189345298 key: test_jcc value: [0.87096774 0.70967742 0.81081081 0.67741935 0.81578947 0.9375 0.96774194 0.83870968 0.78787879 0.91176471] mean value: 0.832825990728842 key: train_jcc value: [0.92579505 0.74551971 0.83183183 0.92882562 0.90789474 0.90277778 0.92957746 0.89007092 0.93127148 0.94197952] mean value: 0.8935544122114777 MCC on Blind test: 0.1 Accuracy on Blind test: 0.4 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.10854602 0.09391761 0.09340096 0.09336042 0.09349442 0.0939045 0.09685636 0.09437943 0.09400725 0.09450531] mean value: 0.09563722610473632 key: score_time value: [0.01416063 0.01400757 0.01419139 0.0142355 0.01414442 0.01419091 0.01533508 0.01431847 0.01418138 0.0142591 ] mean value: 0.014302444458007813 key: test_mcc value: [0.96824584 1. 0.96824584 0.96824584 0.96824584 0.96824584 1. 0.96824584 0.90215054 1. ] mean value: 0.9711625556945535 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98387097 1. 0.98387097 0.98387097 0.98387097 0.98387097 1. 0.98387097 0.95081967 1. ] mean value: 0.985404547858276 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98360656 1. 0.98412698 0.98360656 0.98412698 0.98412698 1. 0.98360656 0.95081967 1. ] mean value: 0.9854020296643248 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.96875 1. 0.96875 0.96875 1. 1. 0.93548387 1. ] mean value: 0.9841733870967742 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96774194 1. 1. 0.96774194 1. 1. 1. 0.96774194 0.96666667 1. ] mean value: 0.986989247311828 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98387097 1. 0.98387097 0.98387097 0.98387097 0.98387097 1. 0.98387097 0.95107527 1. ] mean value: 0.9854301075268818 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96774194 1. 0.96875 0.96774194 0.96875 0.96875 1. 0.96774194 0.90625 1. ] mean value: 0.9715725806451613 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.09 Accuracy on Blind test: 0.21 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:747: UserWarning: Some inputs do not have OOB scores. This probably means too few estimators were used to compute any reliable oob estimates. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_bagging.py:753: RuntimeWarning: invalid value encountered in true_divide oob_decision_function = predictions / predictions.sum(axis=1)[:, np.newaxis] Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.03534484 0.0434587 0.05311441 0.03592443 0.03713274 0.05070066 0.05334473 0.05310988 0.04448533 0.03317356] mean value: 0.04397892951965332 key: score_time value: [0.022789 0.0229876 0.02233076 0.01710248 0.01946139 0.03598452 0.02479911 0.02968454 0.01835775 0.03061008] mean value: 0.024410724639892578 key: test_mcc value: [0.93743687 0.93743687 0.93548387 0.93743687 0.93548387 0.96824584 0.96824584 0.87831007 0.90215054 0.96774194] mean value: 0.9367972553494428 key: train_mcc value: [1. 0.99640932 0.99640932 0.99640932 1. 1. 0.99283145 1. 0.99641572 0.99641572] mean value: 0.9974890870152905 key: test_accuracy value: [0.96774194 0.96774194 0.96774194 0.96774194 0.96774194 0.98387097 0.98387097 0.93548387 0.95081967 0.98360656] mean value: 0.9676361713379165 key: train_accuracy value: [1. 0.99820144 0.99820144 0.99820144 1. 1. 0.99640288 1. 0.99820467 0.99820467] mean value: 0.9987416529971714 key: test_fscore value: [0.96666667 0.96666667 0.96774194 0.96666667 0.96774194 0.98412698 0.98360656 0.93103448 0.95081967 0.98360656] mean value: 0.9668678124738592 key: train_fscore value: [1. 0.9981982 0.9981982 0.9981982 1. 1. 0.99638989 1. 0.99821109 0.9981982 ] mean value: 0.9987393775723891 key: test_precision value: [1. 1. 0.96774194 1. 0.96774194 0.96875 1. 1. 0.93548387 1. ] mean value: 0.9839717741935484 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 0.99642857 1. ] mean value: 0.9996428571428572 key: test_recall value: [0.93548387 0.93548387 0.96774194 0.93548387 0.96774194 1. 0.96774194 0.87096774 0.96666667 0.96774194] mean value: 0.951505376344086 key: train_recall value: [1. 0.99640288 0.99640288 0.99640288 1. 1. 0.99280576 1. 1. 0.99640288] mean value: 0.9978417266187051 key: test_roc_auc value: [0.96774194 0.96774194 0.96774194 0.96774194 0.96774194 0.98387097 0.98387097 0.93548387 0.95107527 0.98387097] mean value: 0.9676881720430108 key: train_roc_auc value: [1. 0.99820144 0.99820144 0.99820144 1. 1. 0.99640288 1. 0.99820144 0.99820144] mean value: 0.9987410071942446 key: test_jcc value: [0.93548387 0.93548387 0.9375 0.93548387 0.9375 0.96875 0.96774194 0.87096774 0.90625 0.96774194] mean value: 0.9362903225806452 key: train_jcc value: [1. 0.99640288 0.99640288 0.99640288 1. 1. 0.99280576 1. 0.99642857 0.99640288] mean value: 0.9974845837615622 MCC on Blind test: 0.06 Accuracy on Blind test: 0.21 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.18122554 0.21741056 0.19656324 0.19563127 0.21628571 0.1628089 0.20888186 0.19683623 0.13708878 0.13875246] mean value: 0.18514845371246338 key: score_time value: [0.02055907 0.02068186 0.02069783 0.02077031 0.02077198 0.01287293 0.02073574 0.02086687 0.01321125 0.02461028] mean value: 0.019577813148498536 key: test_mcc value: [0.67741935 0.74819006 0.74348441 0.69047575 0.80813523 0.87278605 0.81325006 0.81325006 0.54086022 0.74352218] mean value: 0.7451373368256522 key: train_mcc value: [0.87415162 0.87059372 0.86758591 0.89596753 0.88157448 0.87455914 0.87086426 0.86758591 0.87459701 0.88883589] mean value: 0.8766315468831808 key: test_accuracy value: [0.83870968 0.87096774 0.87096774 0.83870968 0.90322581 0.93548387 0.90322581 0.90322581 0.7704918 0.86885246] mean value: 0.870386039132734 key: train_accuracy value: [0.93705036 0.9352518 0.93345324 0.94784173 0.94064748 0.93705036 0.9352518 0.93345324 0.93716338 0.9443447 ] mean value: 0.9381508078994614 key: test_fscore value: [0.83870968 0.87878788 0.875 0.82142857 0.9 0.9375 0.90909091 0.90909091 0.76666667 0.87878788] mean value: 0.8715062491272169 key: train_fscore value: [0.93738819 0.93571429 0.93474427 0.94849023 0.94138544 0.9380531 0.93617021 0.93474427 0.9380531 0.94474153] mean value: 0.9389484621579285 key: test_precision value: [0.83870968 0.82857143 0.84848485 0.92 0.93103448 0.90909091 0.85714286 0.85714286 0.76666667 0.82857143] mean value: 0.8585415155848971 key: train_precision value: [0.93238434 0.92907801 0.91695502 0.93684211 0.92982456 0.92334495 0.92307692 0.91695502 0.92657343 0.93639576] mean value: 0.9271430114193007 key: test_recall value: [0.83870968 0.93548387 0.90322581 0.74193548 0.87096774 0.96774194 0.96774194 0.96774194 0.76666667 0.93548387] mean value: 0.8895698924731182 key: train_recall value: [0.94244604 0.94244604 0.95323741 0.96043165 0.95323741 0.95323741 0.94964029 0.95323741 0.94982079 0.95323741] mean value: 0.9510971867667156 key: test_roc_auc value: [0.83870968 0.87096774 0.87096774 0.83870968 0.90322581 0.93548387 0.90322581 0.90322581 0.77043011 0.86774194] mean value: 0.870268817204301 key: train_roc_auc value: [0.93705036 0.9352518 0.93345324 0.94784173 0.94064748 0.93705036 0.9352518 0.93345324 0.93714061 0.94436064] mean value: 0.9381501250612413 key: test_jcc value: [0.72222222 0.78378378 0.77777778 0.6969697 0.81818182 0.88235294 0.83333333 0.83333333 0.62162162 0.78378378] mean value: 0.7753360312183841 key: train_jcc value: [0.88215488 0.87919463 0.87748344 0.90202703 0.88926174 0.88333333 0.88 0.87748344 0.88333333 0.89527027] mean value: 0.884954210937499 MCC on Blind test: 0.22 Accuracy on Blind test: 0.49 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.25665951 0.24086618 0.24189734 0.23880529 0.24180579 0.24213672 0.24336982 0.24555063 0.24932742 0.25003719] mean value: 0.2450455904006958 key: score_time value: [0.00856853 0.0083406 0.00863934 0.00827336 0.00876927 0.00846887 0.00852108 0.00857925 0.00858474 0.00857282] mean value: 0.008531785011291504 key: test_mcc value: [0.96824584 0.96824584 0.93548387 1. 1. 0.96824584 1. 0.96824584 0.9344086 1. ] mean value: 0.9742875819325697 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.98387097 0.98387097 0.96774194 1. 1. 0.98387097 1. 0.98387097 0.96721311 1. ] mean value: 0.9870438921205711 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.98360656 0.98360656 0.96774194 1. 1. 0.98412698 1. 0.98360656 0.96666667 1. ] mean value: 0.986935525840867 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.96774194 1. 1. 0.96875 1. 1. 0.96666667 1. ] mean value: 0.9903158602150538 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.96774194 0.96774194 0.96774194 1. 1. 1. 1. 0.96774194 0.96666667 1. ] mean value: 0.983763440860215 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.98387097 0.98387097 0.96774194 1. 1. 0.98387097 1. 0.98387097 0.9672043 1. ] mean value: 0.9870430107526882 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.96774194 0.96774194 0.9375 1. 1. 0.96875 1. 0.96774194 0.93548387 1. ] mean value: 0.9744959677419355 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.05 Accuracy on Blind test: 0.19 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.01201487 0.01361275 0.01402354 0.01380372 0.01379108 0.02837563 0.01576805 0.01669693 0.02459884 0.01624608] mean value: 0.01689314842224121 key: score_time value: [0.0111146 0.01098752 0.01094055 0.01091433 0.01093793 0.01111317 0.01172638 0.01128578 0.01110101 0.01107979] mean value: 0.011120104789733886 key: test_mcc value: [0.74193548 0.80813523 0.81325006 0.52297636 0.74819006 0.67419986 0.67883359 0.81325006 0.72516604 0.71375712] mean value: 0.7239693864680706 key: train_mcc value: [0.82567165 0.81659431 0.79995316 0.7380124 0.83549358 0.78285538 0.76623167 0.78683637 0.87297353 0.8490525 ] mean value: 0.8073674571945186 key: test_accuracy value: [0.87096774 0.90322581 0.90322581 0.75806452 0.87096774 0.82258065 0.83870968 0.90322581 0.85245902 0.85245902] mean value: 0.8575885774722369 key: train_accuracy value: [0.9118705 0.90827338 0.89748201 0.85971223 0.91546763 0.88848921 0.88309353 0.89028777 0.93536804 0.92280072] mean value: 0.9012845020213631 key: test_fscore value: [0.87096774 0.9 0.89655172 0.73684211 0.87878788 0.84507042 0.83333333 0.89655172 0.86567164 0.86567164] mean value: 0.8589448213713017 key: train_fscore value: [0.90875233 0.90876565 0.89142857 0.84210526 0.91965812 0.89491525 0.88245931 0.88291747 0.93771626 0.92598967] mean value: 0.8994707904383525 key: test_precision value: [0.87096774 0.93103448 0.96296296 0.80769231 0.82857143 0.75 0.86206897 0.96296296 0.78378378 0.80555556] mean value: 0.8565600191740348 key: train_precision value: [0.94208494 0.90391459 0.94736842 0.96296296 0.8762215 0.84615385 0.88727273 0.94650206 0.90635452 0.88778878] mean value: 0.9106624340187 key: test_recall value: [0.87096774 0.87096774 0.83870968 0.67741935 0.93548387 0.96774194 0.80645161 0.83870968 0.96666667 0.93548387] mean value: 0.8708602150537634 key: train_recall value: [0.87769784 0.91366906 0.84172662 0.74820144 0.9676259 0.94964029 0.87769784 0.82733813 0.97132616 0.9676259 ] mean value: 0.8942549186457286 key: test_roc_auc value: [0.87096774 0.90322581 0.90322581 0.75806452 0.87096774 0.82258065 0.83870968 0.90322581 0.85430108 0.85107527] mean value: 0.8576344086021506 key: train_roc_auc value: [0.9118705 0.90827338 0.89748201 0.85971223 0.91546763 0.88848921 0.88309353 0.89028777 0.93530337 0.92288105] mean value: 0.9012860679198577 key: test_jcc value: [0.77142857 0.81818182 0.8125 0.58333333 0.78378378 0.73170732 0.71428571 0.8125 0.76315789 0.76315789] mean value: 0.7554036327560076 key: train_jcc value: [0.83276451 0.83278689 0.80412371 0.72727273 0.85126582 0.80981595 0.78964401 0.79037801 0.88273616 0.86217949] mean value: 0.8182967266032459 MCC on Blind test: 0.15 Accuracy on Blind test: 0.77 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.01394916 0.01445484 0.01928425 0.01152682 0.01178312 0.01156855 0.01153874 0.0114634 0.01149082 0.01142311] mean value: 0.012848281860351562 key: score_time value: [0.01326442 0.01076293 0.01061916 0.01054502 0.01053524 0.01052785 0.01054406 0.01065135 0.01066136 0.01065755] mean value: 0.010876893997192383 key: test_mcc value: [0.90369611 1. 0.90369611 0.87831007 0.80813523 0.93743687 0.90369611 1. 0.77072165 0.90586325] mean value: 0.9011555410657976 key: train_mcc value: [0.91741458 0.92145965 0.92518498 0.92475364 0.93914669 0.91054923 0.93214329 0.92475364 0.93206857 0.92840473] mean value: 0.9255878992579923 key: test_accuracy value: [0.9516129 1. 0.9516129 0.93548387 0.90322581 0.96774194 0.9516129 1. 0.8852459 0.95081967] mean value: 0.9497355896351137 key: train_accuracy value: [0.95863309 0.96043165 0.96223022 0.96223022 0.96942446 0.95503597 0.96582734 0.96223022 0.96588869 0.96409336] mean value: 0.9626025212146261 key: test_fscore value: [0.95238095 1. 0.95238095 0.93103448 0.90625 0.96875 0.95238095 1. 0.88135593 0.95384615] mean value: 0.9498379425951021 key: train_fscore value: [0.95900178 0.96113074 0.96296296 0.96269982 0.96980462 0.95575221 0.96637168 0.96269982 0.96637168 0.96441281] mean value: 0.9631208137030208 key: test_precision value: [0.9375 1. 0.9375 1. 0.87878788 0.93939394 0.9375 1. 0.89655172 0.91176471] mean value: 0.9438998248202102 key: train_precision value: [0.95053004 0.94444444 0.94463668 0.95087719 0.95789474 0.94076655 0.95121951 0.95087719 0.95454545 0.95422535] mean value: 0.9500017150163743 key: test_recall value: [0.96774194 1. 0.96774194 0.87096774 0.93548387 1. 0.96774194 1. 0.86666667 1. ] mean value: 0.9576344086021505 key: train_recall value: [0.9676259 0.97841727 0.98201439 0.97482014 0.98201439 0.97122302 0.98201439 0.97482014 0.97849462 0.97482014] mean value: 0.9766264407828575 key: test_roc_auc value: [0.9516129 1. 0.9516129 0.93548387 0.90322581 0.96774194 0.9516129 1. 0.88494624 0.95 ] mean value: 0.9496236559139786 key: train_roc_auc value: [0.95863309 0.96043165 0.96223022 0.96223022 0.96942446 0.95503597 0.96582734 0.96223022 0.96586602 0.96411258] mean value: 0.9626021763234573 key: test_jcc value: [0.90909091 1. 0.90909091 0.87096774 0.82857143 0.93939394 0.90909091 1. 0.78787879 0.91176471] mean value: 0.906584933093472 key: train_jcc value: [0.92123288 0.92517007 0.92857143 0.92808219 0.94137931 0.91525424 0.93493151 0.92808219 0.93493151 0.93127148] mean value: 0.9288906795867435 MCC on Blind test: 0.19 Accuracy on Blind test: 0.44 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:203: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./katg_config.py:206: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_ppi2_affinity', 'interface_dist', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.11430693 0.1973083 0.16238761 0.19749618 0.09576607 0.1363256 0.09693003 0.12462544 0.20535517 0.23405504] mean value: 0.1564556360244751 key: score_time value: [0.01904321 0.02068663 0.02037406 0.02086973 0.01105285 0.01114559 0.01939201 0.01616836 0.01520658 0.01612258] mean value: 0.01700615882873535 key: test_mcc value: [0.90369611 1. 0.93548387 0.87831007 0.84266484 0.93743687 0.93743687 0.96824584 0.77072165 0.93635873] mean value: 0.9110354846088805 key: train_mcc value: [0.92844206 0.93563929 0.9393413 0.93563929 0.94986154 0.9393413 0.93214329 0.92844206 0.94264494 0.93558747] mean value: 0.9367082543906752 key: test_accuracy value: [0.9516129 1. 0.96774194 0.93548387 0.91935484 0.96774194 0.96774194 0.98387097 0.8852459 0.96721311] mean value: 0.9546007403490216 key: train_accuracy value: [0.96402878 0.9676259 0.96942446 0.9676259 0.97482014 0.96942446 0.96582734 0.96402878 0.97127469 0.96768402] mean value: 0.9681764462756545 key: test_fscore value: [0.95238095 1. 0.96774194 0.93103448 0.92307692 0.96875 0.96875 0.98360656 0.88135593 0.96875 ] mean value: 0.9545446783280807 key: train_fscore value: [0.96453901 0.96808511 0.9699115 0.96808511 0.97508897 0.9699115 0.96637168 0.96453901 0.97153025 0.96797153] mean value: 0.9686033664546803 key: test_precision value: [0.9375 1. 0.96774194 1. 0.88235294 0.93939394 0.93939394 1. 0.89655172 0.93939394] mean value: 0.950232841898009 key: train_precision value: [0.95104895 0.95454545 0.95470383 0.95454545 0.96478873 0.95470383 0.95121951 0.95104895 0.96466431 0.95774648] mean value: 0.9559015511110829 key: test_recall value: [0.96774194 1. 0.96774194 0.87096774 0.96774194 1. 1. 0.96774194 0.86666667 1. ] mean value: 0.9608602150537635 key: train_recall value: [0.97841727 0.98201439 0.98561151 0.98201439 0.98561151 0.98561151 0.98201439 0.97841727 0.97849462 0.97841727] mean value: 0.9816624120058792 key: test_roc_auc value: [0.9516129 1. 0.96774194 0.93548387 0.91935484 0.96774194 0.96774194 0.98387097 0.88494624 0.96666667] mean value: 0.9545161290322581 key: train_roc_auc value: [0.96402878 0.9676259 0.96942446 0.9676259 0.97482014 0.96942446 0.96582734 0.96402878 0.9712617 0.96770326] mean value: 0.9681770712462289 key: test_jcc value: [0.90909091 1. 0.9375 0.87096774 0.85714286 0.93939394 0.93939394 0.96774194 0.78787879 0.93939394] mean value: 0.9148504049713727 key: train_jcc value: [0.93150685 0.93814433 0.94158076 0.93814433 0.95138889 0.94158076 0.93493151 0.93150685 0.94463668 0.93793103] mean value: 0.9391351978873097 MCC on Blind test: 0.15 Accuracy on Blind test: 0.38