/home/tanu/git/LSHTM_analysis/scripts/ml/ml_data.py:550: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy mask_check.sort_values(by = ['ligand_distance'], ascending = True, inplace = True) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/xgboost/compat.py:36: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import MultiIndex, Int64Index 1.22.4 1.4.1 aaindex_df contains non-numerical data Total no. of non-numerial columns: 2 Selecting numerical data only PASS: successfully selected numerical columns only for aaindex_df Now checking for NA in the remaining aaindex_cols Counting aaindex_df cols with NA ncols with NA: 4 columns Dropping these... Original ncols: 127 Revised df ncols: 123 Checking NA in revised df... PASS: cols with NA successfully dropped from aaindex_df Proceeding with combining aa_df with other features_df PASS: ncols match Expected ncols: 123 Got: 123 Total no. of columns in clean aa_df: 123 Proceeding to merge, expected nrows in merged_df: 531 PASS: my_features_df and aa_df successfully combined nrows: 531 ncols: 286 count of NULL values before imputation or_mychisq 263 log10_or_mychisq 263 dtype: int64 count of NULL values AFTER imputation mutationinformation 0 or_rawI 0 logorI 0 dtype: int64 PASS: OR values imputed, data ready for ML No. of numerical features: 44 No. of categorical features: 7 index: 0 ind: 1 Mask count check: True index: 1 ind: 2 Mask count check: True Original Data Counter({0: 76, 1: 43}) Data dim: (119, 51) ------------------------------------------------------------- Successfully split data: UQ [no aa_index but active site included] training actual values: training set imputed values: blind test set Train data size: (119, 51) Test data size: (412, 51) y_train numbers: Counter({0: 76, 1: 43}) y_train ratio: 1.7674418604651163 y_test_numbers: Counter({0: 409, 1: 3}) y_test ratio: 136.33333333333334 ------------------------------------------------------------- Simple Random OverSampling Counter({0: 76, 1: 76}) (152, 51) Simple Random UnderSampling Counter({0: 43, 1: 43}) (86, 51) Simple Combined Over and UnderSampling Counter({0: 76, 1: 76}) (152, 51) SMOTE_NC OverSampling Counter({0: 76, 1: 76}) (152, 51) ##################################################################### Running ML analysis: UQ [without AA index but with active site annotations] Gene name: gid Drug name: streptomycin Output directory: /home/tanu/git/Data/streptomycin/output/ml/uq_v1/ Sanity checks: Total input features: 51 Training data size: (119, 51) Test data size: (412, 51) Target feature numbers (training data): Counter({0: 76, 1: 43}) Target features ratio (training data: 1.7674418604651163 Target feature numbers (test data): Counter({0: 409, 1: 3}) Target features ratio (test data): 136.33333333333334 ##################################################################### ================================================================ Strucutral features (n): 35 These are: Common stablity features: ['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity'] FoldX columns: ['electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr', 'disulfide_mm', 'disulfide_sm', 'disulfide_ss', 'hbonds_rr', 'hbonds_mm', 'hbonds_sm', 'hbonds_ss', 'partcov_rr', 'partcov_mm', 'partcov_sm', 'partcov_ss', 'vdwclashes_rr', 'vdwclashes_mm', 'vdwclashes_sm', 'vdwclashes_ss', 'volumetric_rr', 'volumetric_mm', 'volumetric_ss'] Other struc columns: ['rsa', 'kd_values', 'rd_values'] ================================================================ Evolutionary features (n): 3 These are: ['consurf_score', 'snap2_score', 'provean_score'] ================================================================ Genomic features (n): 6 These are: ['maf', 'logorI'] ['lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'] ================================================================ Categorical features (n): 7 These are: ['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'] ================================================================ Pass: No. of features match ##################################################################### Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.01348901 0.01226354 0.01228833 0.01419163 0.01196408 0.01235008 0.01226306 0.01198006 0.01196694 0.01303458] mean value: 0.012579131126403808 key: score_time value: [0.00877213 0.00875831 0.0089767 0.00837636 0.00834727 0.00831747 0.00833845 0.00829291 0.00832725 0.00867438] mean value: 0.008518123626708984 key: test_mcc value: [0.42640143 0.40824829 0. 0.625 0.63245553 0.70710678 0.68313005 0.83666003 0.31428571 0.62360956] mean value: 0.5256897392741394 key: train_mcc value: [0.73433335 0.80052092 0.81774488 0.71490799 0.77603911 0.73433335 0.75414636 0.75414636 0.79379397 0.7364483 ] mean value: 0.7616414584886299 key: test_accuracy value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [0.75 0.75 0.5 0.83333333 0.83333333 0.83333333 0.83333333 0.91666667 0.66666667 0.81818182] mean value: 0.7734848484848484 key: train_accuracy value: [0.87850467 0.90654206 0.91588785 0.86915888 0.89719626 0.87850467 0.88785047 0.88785047 0.90654206 0.87962963] mean value: 0.89076670128072 key: test_fscore value: [0.4 0.57142857 0.4 0.75 0.66666667 0.8 0.75 0.88888889 0.6 0.66666667] mean value: 0.6493650793650794 key: train_fscore value: [0.82191781 0.85714286 0.87671233 0.8 0.84931507 0.82191781 0.82352941 0.82352941 0.86111111 0.81690141] mean value: 0.8352077213932715 key: test_precision value: [1. 0.66666667 0.33333333 0.75 1. 0.66666667 1. 1. 0.6 1. ] mean value: 0.8016666666666666 key: train_precision value: [0.88235294 0.96774194 0.94117647 0.90322581 0.91176471 0.88235294 0.93333333 0.93333333 0.91176471 0.90625 ] mean value: 0.9173296173308033 key: test_recall value: [0.25 0.5 0.5 0.75 0.5 1. 0.6 0.8 0.6 0.5 ] mean value: 0.6 key: train_recall value: [0.76923077 0.76923077 0.82051282 0.71794872 0.79487179 0.76923077 0.73684211 0.73684211 0.81578947 0.74358974] mean value: 0.7674089068825911 key: test_roc_auc value: [0.625 0.6875 0.5 0.8125 0.75 0.875 0.8 0.9 0.65714286 0.75 ] mean value: 0.7357142857142858 key: train_roc_auc value: [0.85520362 0.87726244 0.89555053 0.83691554 0.87537707 0.85520362 0.8539283 0.8539283 0.88615561 0.85005574] mean value: 0.8639580766297014 key: test_jcc value: [0.25 0.4 0.25 0.6 0.5 0.66666667 0.6 0.8 0.42857143 0.5 ] mean value: 0.49952380952380954 key: train_jcc value: [0.69767442 0.75 0.7804878 0.66666667 0.73809524 0.69767442 0.7 0.7 0.75609756 0.69047619] mean value: 0.7177172298301056 MCC on Blind test: 0.15 Accuracy on Blind test: 0.77 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.38598299 0.37587595 0.37197232 0.36586213 0.3655436 0.3787601 0.37384486 0.3567059 0.36220002 0.34969902] mean value: 0.3686446905136108 key: score_time value: [0.00918126 0.00917006 0.00951552 0.00891447 0.00908375 0.00929761 0.00941563 0.00886798 0.00938153 0.00919795] mean value: 0.00920257568359375 key: test_mcc value: [1. 0.625 0.35355339 0.83666003 0.625 0.70710678 0.83666003 0.83666003 0.50709255 0.60714286] mean value: 0.6934875661362015 key: train_mcc value: [0.89876312 1. 0.9600061 0.85805669 0.95965309 0.95965309 0.93862091 0.85625561 1. 0.81859189] mean value: 0.9249600511154796 key: test_accuracy value: [1. 0.83333333 0.66666667 0.91666667 0.83333333 0.83333333 0.91666667 0.91666667 0.75 0.81818182] mean value: 0.8484848484848485 key: train_accuracy value: [0.95327103 1. 0.98130841 0.93457944 0.98130841 0.98130841 0.97196262 0.93457944 1. 0.91666667] mean value: 0.9654984423676012 key: test_fscore value: [1. 0.75 0.6 0.88888889 0.75 0.8 0.88888889 0.88888889 0.72727273 0.75 ] mean value: 0.8043939393939394 key: train_fscore value: [0.93506494 1. 0.97368421 0.90666667 0.97435897 0.97435897 0.96 0.90410959 1. 0.87671233] mean value: 0.9504955678784085 key: test_precision value: [1. 0.75 0.5 0.8 0.75 0.66666667 1. 1. 0.66666667 0.75 ] mean value: 0.7883333333333333 key: train_precision value: [0.94736842 1. 1. 0.94444444 0.97435897 0.97435897 0.97297297 0.94285714 1. 0.94117647] mean value: 0.9697537400633376 key: test_recall value: [1. 0.75 0.75 1. 0.75 1. 0.8 0.8 0.8 0.75] mean value: 0.84 key: train_recall value: [0.92307692 1. 0.94871795 0.87179487 0.97435897 0.97435897 0.94736842 0.86842105 1. 0.82051282] mean value: 0.9328609986504723 key: test_roc_auc value: [1. 0.8125 0.6875 0.9375 0.8125 0.875 0.9 0.9 0.75714286 0.80357143] mean value: 0.8485714285714285 key: train_roc_auc value: [0.94683258 1. 0.97435897 0.92119155 0.97982655 0.97982655 0.96643783 0.91971777 1. 0.89576366] mean value: 0.9583955462135567 key: test_jcc value: [1. 0.6 0.42857143 0.8 0.6 0.66666667 0.8 0.8 0.57142857 0.6 ] mean value: 0.6866666666666666 key: train_jcc value: [0.87804878 1. 0.94871795 0.82926829 0.95 0.95 0.92307692 0.825 1. 0.7804878 ] mean value: 0.9084599749843653 MCC on Blind test: 0.01 Accuracy on Blind test: 0.7 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.00942707 0.00902557 0.00695324 0.00660396 0.00666595 0.00662398 0.00659394 0.00675702 0.00663805 0.00661373] mean value: 0.007190251350402832 key: score_time value: [0.01058674 0.01051068 0.00814915 0.00790286 0.00790191 0.0078671 0.00790071 0.00779772 0.00793719 0.0078752 ] mean value: 0.008442926406860351 key: test_mcc value: [0.81649658 0.47809144 0.5 0.23904572 0.35355339 0.47809144 0.16903085 0.50709255 0.16903085 0.35634832] mean value: 0.4066781158133809 key: train_mcc value: [0.63375685 0.67693504 0.66003337 0.51450646 0.70701192 0.58648859 0.69614472 0.60558322 0.65590587 0.6700827 ] mean value: 0.6406448743407447 key: test_accuracy value: [0.91666667 0.75 0.66666667 0.58333333 0.66666667 0.75 0.58333333 0.75 0.58333333 0.54545455] mean value: 0.6795454545454546 key: train_accuracy value: [0.80373832 0.8317757 0.8317757 0.71962617 0.85046729 0.81308411 0.8317757 0.82242991 0.80373832 0.83333333] mean value: 0.8141744548286605 key: test_fscore value: [0.85714286 0.66666667 0.66666667 0.54545455 0.6 0.66666667 0.54545455 0.72727273 0.54545455 0.61538462] mean value: 0.6436163836163835 key: train_fscore value: [0.77419355 0.8 0.79069767 0.70588235 0.81818182 0.71428571 0.80434783 0.6984127 0.77894737 0.79545455] mean value: 0.7680403546589664 key: test_precision value: [1. 0.6 0.5 0.42857143 0.5 0.6 0.5 0.66666667 0.5 0.44444444] mean value: 0.5739682539682539 key: train_precision value: [0.66666667 0.70588235 0.72340426 0.57142857 0.73469388 0.80645161 0.68518519 0.88 0.64912281 0.71428571] mean value: 0.7137121043298253 key: test_recall value: [0.75 0.75 1. 0.75 0.75 0.75 0.6 0.8 0.6 1. ] mean value: 0.775 key: train_recall value: [0.92307692 0.92307692 0.87179487 0.92307692 0.92307692 0.64102564 0.97368421 0.57894737 0.97368421 0.8974359 ] mean value: 0.8628879892037787 key: test_roc_auc value: [0.875 0.75 0.75 0.625 0.6875 0.75 0.58571429 0.75714286 0.58571429 0.64285714] mean value: 0.7008928571428571 key: train_roc_auc value: [0.82918552 0.85124434 0.8403092 0.76300905 0.86595023 0.77639517 0.8636537 0.76773455 0.84191457 0.84726867] mean value: 0.8246665009957512 key: test_jcc value: [0.75 0.5 0.5 0.375 0.42857143 0.5 0.375 0.57142857 0.375 0.44444444] mean value: 0.48194444444444445 key: train_jcc value: [0.63157895 0.66666667 0.65384615 0.54545455 0.69230769 0.55555556 0.67272727 0.53658537 0.63793103 0.66037736] mean value: 0.625303059275329 MCC on Blind test: 0.03 Accuracy on Blind test: 0.49 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))]/home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00705171 0.00684381 0.00682545 0.00677752 0.0068419 0.0067687 0.00679421 0.00680947 0.0066936 0.00685263] mean value: 0.0068259000778198246 key: score_time value: [0.00836945 0.0079124 0.0078671 0.00792718 0.00797105 0.00790572 0.00787711 0.00786543 0.00802302 0.00789714] mean value: 0.007961559295654296 key: test_mcc value: [ 0. 0.25 -0.23904572 0.47809144 0.40824829 0. -0.09759001 0.52915026 0.31428571 0.38575837] mean value: 0.20288983564397506 key: train_mcc value: [0.4754902 0.50673892 0.4653488 0.44239297 0.48817818 0.50337256 0.39242808 0.39534618 0.48161946 0.37522992] mean value: 0.4526145268371993 key: test_accuracy value: [0.66666667 0.66666667 0.41666667 0.75 0.75 0.41666667 0.5 0.75 0.66666667 0.72727273] mean value: 0.6310606060606061 key: train_accuracy value: [0.75700935 0.77570093 0.75700935 0.74766355 0.76635514 0.77570093 0.72897196 0.71962617 0.76635514 0.72222222] mean value: 0.7516614745586708 key: test_fscore value: [0. 0.5 0.22222222 0.66666667 0.57142857 0.46153846 0.25 0.57142857 0.6 0.57142857] mean value: 0.4414713064713065 key: train_fscore value: [0.66666667 0.67567568 0.64864865 0.63013699 0.66666667 0.66666667 0.5915493 0.61538462 0.65753425 0.57142857] mean value: 0.6390358039788872 key: test_precision value: [0. 0.5 0.2 0.6 0.66666667 0.33333333 0.33333333 1. 0.6 0.66666667] mean value: 0.49 key: train_precision value: [0.66666667 0.71428571 0.68571429 0.67647059 0.69444444 0.72727273 0.63636364 0.6 0.68571429 0.64516129] mean value: 0.6732093639019635 key: test_recall value: [0. 0.5 0.25 0.75 0.5 0.75 0.2 0.4 0.6 0.5 ] mean value: 0.445 key: train_recall value: [0.66666667 0.64102564 0.61538462 0.58974359 0.64102564 0.61538462 0.55263158 0.63157895 0.63157895 0.51282051] mean value: 0.6097840755735493 key: test_roc_auc value: [0.5 0.625 0.375 0.75 0.6875 0.5 0.45714286 0.7 0.65714286 0.67857143] mean value: 0.5930357142857143 key: train_roc_auc value: [0.7377451 0.74698341 0.72680995 0.71398944 0.73963047 0.74151584 0.68935927 0.69984744 0.73607933 0.67670011] mean value: 0.7208660360817448 key: test_jcc value: [0. 0.33333333 0.125 0.5 0.4 0.3 0.14285714 0.4 0.42857143 0.4 ] mean value: 0.30297619047619045 key: train_jcc value: [0.5 0.51020408 0.48 0.46 0.5 0.5 0.42 0.44444444 0.48979592 0.4 ] mean value: 0.47044444444444444 MCC on Blind test: 0.14 Accuracy on Blind test: 0.73 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00671077 0.00906372 0.00673389 0.00643206 0.00652814 0.00724578 0.00731564 0.00711012 0.00714564 0.00714374] mean value: 0.007142949104309082 key: score_time value: [0.04456663 0.02610064 0.00889969 0.00866151 0.00879526 0.00940728 0.00941706 0.00944591 0.00944233 0.00941896] mean value: 0.014415526390075683 key: test_mcc value: [ 0. 0. 0.47809144 0.625 0.15811388 0.47809144 0.07559289 0.29277002 0.11952286 -0.03857584] mean value: 0.21886067104052556 key: train_mcc value: [0.47836451 0.54358024 0.65128682 0.48080439 0.47687292 0.38417516 0.55925621 0.60298802 0.55802654 0.50141804] mean value: 0.5236772842107089 key: test_accuracy value: [0.66666667 0.58333333 0.75 0.83333333 0.66666667 0.75 0.58333333 0.66666667 0.58333333 0.54545455] mean value: 0.6628787878787878 key: train_accuracy value: [0.76635514 0.79439252 0.8411215 0.76635514 0.76635514 0.72897196 0.80373832 0.82242991 0.80373832 0.77777778] mean value: 0.7871235721703012 key: test_fscore value: [0. 0.28571429 0.66666667 0.75 0.33333333 0.66666667 0.28571429 0.5 0.44444444 0.28571429] mean value: 0.4218253968253968 key: train_fscore value: [0.63768116 0.66666667 0.75362319 0.64788732 0.62686567 0.53968254 0.69565217 0.70769231 0.67692308 0.64705882] mean value: 0.6599732931818586 key: test_precision value: [0. 0.33333333 0.6 0.75 0.5 0.6 0.5 0.66666667 0.5 0.33333333] mean value: 0.47833333333333333 key: train_precision value: [0.73333333 0.81481481 0.86666667 0.71875 0.75 0.70833333 0.77419355 0.85185185 0.81481481 0.75862069] mean value: 0.7791379052857084 key: test_recall value: [0. 0.25 0.75 0.75 0.25 0.75 0.2 0.4 0.4 0.25] mean value: 0.4 key: train_recall value: [0.56410256 0.56410256 0.66666667 0.58974359 0.53846154 0.43589744 0.63157895 0.60526316 0.57894737 0.56410256] mean value: 0.5738866396761133 key: test_roc_auc value: [0.5 0.5 0.75 0.8125 0.5625 0.75 0.52857143 0.62857143 0.55714286 0.48214286] mean value: 0.6071428571428571 key: train_roc_auc value: [0.72322775 0.74528658 0.80392157 0.72869532 0.71776018 0.66647813 0.76506484 0.77364607 0.7532418 0.73132664] mean value: 0.7408648884655077 key: test_jcc value: [0. 0.16666667 0.5 0.6 0.2 0.5 0.16666667 0.33333333 0.28571429 0.16666667] mean value: 0.2919047619047619 key: train_jcc value: [0.46808511 0.5 0.60465116 0.47916667 0.45652174 0.36956522 0.53333333 0.54761905 0.51162791 0.47826087] mean value: 0.4948831049856425 MCC on Blind test: 0.04 Accuracy on Blind test: 0.82 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.00767875 0.00732279 0.00758123 0.00753927 0.00771689 0.00793982 0.00819755 0.00749803 0.00751305 0.00764227] mean value: 0.0076629638671875 key: score_time value: [0.00804496 0.00836205 0.00849819 0.0081315 0.00821066 0.00882912 0.0087676 0.00863767 0.00823522 0.00829029] mean value: 0.008400726318359374 key: test_mcc value: [0.42640143 0.40824829 0.11952286 0.81649658 0.63245553 0.83666003 0.35675303 0.52915026 0.11952286 0.41833001] mean value: 0.4663540894023734 key: train_mcc value: [0.71777084 0.72240602 0.71777084 0.69776211 0.73774797 0.67769958 0.672375 0.71336904 0.78283392 0.67891024] mean value: 0.7118645559720945 key: test_accuracy value: [0.75 0.75 0.58333333 0.91666667 0.83333333 0.91666667 0.66666667 0.75 0.58333333 0.72727273] mean value: 0.7477272727272727 key: train_accuracy value: [0.86915888 0.86915888 0.86915888 0.85981308 0.87850467 0.85046729 0.85046729 0.86915888 0.89719626 0.85185185] mean value: 0.8664935964001385 key: test_fscore value: [0.4 0.57142857 0.44444444 0.85714286 0.66666667 0.88888889 0.33333333 0.57142857 0.44444444 0.4 ] mean value: 0.5577777777777778 key: train_fscore value: [0.79411765 0.78787879 0.79411765 0.7761194 0.8115942 0.75757576 0.75 0.78787879 0.83076923 0.75757576] mean value: 0.7847627221679594 key: test_precision value: [1. 0.66666667 0.4 1. 1. 0.8 1. 1. 0.5 1. ] mean value: 0.8366666666666667 key: train_precision value: [0.93103448 0.96296296 0.93103448 0.92857143 0.93333333 0.92592593 0.92307692 0.92857143 1. 0.92592593] mean value: 0.939043689388517 key: test_recall value: [0.25 0.5 0.5 0.75 0.5 1. 0.2 0.4 0.4 0.25] mean value: 0.475 key: train_recall value: [0.69230769 0.66666667 0.69230769 0.66666667 0.71794872 0.64102564 0.63157895 0.68421053 0.71052632 0.64102564] mean value: 0.6744264507422402 key: test_roc_auc value: [0.625 0.6875 0.5625 0.875 0.75 0.9375 0.6 0.7 0.55714286 0.625 ] mean value: 0.6919642857142857 key: train_roc_auc value: [0.83144796 0.82598039 0.83144796 0.81862745 0.84426848 0.80580694 0.80129672 0.82761251 0.85526316 0.80602007] mean value: 0.8247771639900459 key: test_jcc value: [0.25 0.4 0.28571429 0.75 0.5 0.8 0.2 0.4 0.28571429 0.25 ] mean value: 0.41214285714285714 key: train_jcc value: [0.65853659 0.65 0.65853659 0.63414634 0.68292683 0.6097561 0.6 0.65 0.71052632 0.6097561 ] mean value: 0.6464184852374839 MCC on Blind test: 0.16 Accuracy on Blind test: 0.79 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.44463158 0.42616534 0.55910635 0.43400979 0.4541378 0.42967033 0.43932247 0.51193166 0.42841673 0.43269181] mean value: 0.4560083866119385 key: score_time value: [0.01107335 0.01112819 0.01112199 0.0153079 0.01128125 0.0111084 0.02174282 0.01112676 0.01113582 0.01421332] mean value: 0.012923979759216308 key: test_mcc value: [0.81649658 0.83666003 0. 0.70710678 0.15811388 0.47809144 0.07559289 0.47809144 0.31428571 0.38575837] mean value: 0.42501971429170726 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91666667 0.91666667 0.5 0.83333333 0.66666667 0.75 0.58333333 0.75 0.66666667 0.72727273] mean value: 0.7310606060606061 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.85714286 0.88888889 0.4 0.8 0.33333333 0.66666667 0.28571429 0.66666667 0.6 0.57142857] mean value: 0.606984126984127 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.8 0.33333333 0.66666667 0.5 0.6 0.5 0.75 0.6 0.66666667] mean value: 0.6416666666666666 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 1. 0.5 1. 0.25 0.75 0.2 0.6 0.6 0.5 ] mean value: 0.615 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.875 0.9375 0.5 0.875 0.5625 0.75 0.52857143 0.72857143 0.65714286 0.67857143] mean value: 0.7092857142857143 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.75 0.8 0.25 0.66666667 0.2 0.5 0.16666667 0.5 0.42857143 0.4 ] mean value: 0.4661904761904762 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.07 Accuracy on Blind test: 0.69 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01969337 0.00755858 0.00811434 0.00742817 0.0073216 0.00782657 0.00776482 0.00792885 0.00732517 0.00794721] mean value: 0.008890867233276367 key: score_time value: [0.01085663 0.00857472 0.00874376 0.0083375 0.00824308 0.00869703 0.00888395 0.00872183 0.00871086 0.00867748] mean value: 0.008844685554504395 key: test_mcc value: [0.83666003 0.625 0.81649658 0.81649658 0.83666003 1. 0.50709255 0.84515425 0.65714286 0.81009259] mean value: 0.7750795466933069 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91666667 0.83333333 0.91666667 0.91666667 0.91666667 1. 0.75 0.91666667 0.83333333 0.90909091] mean value: 0.8909090909090909 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.88888889 0.75 0.85714286 0.85714286 0.88888889 1. 0.72727273 0.90909091 0.8 0.85714286] mean value: 0.8535569985569985 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8 0.75 1. 1. 0.8 1. 0.66666667 0.83333333 0.8 1. ] mean value: 0.865 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.75 0.75 0.75 1. 1. 0.8 1. 0.8 0.75] mean value: 0.86 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9375 0.8125 0.875 0.875 0.9375 1. 0.75714286 0.92857143 0.82857143 0.875 ] mean value: 0.8826785714285714 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.8 0.6 0.75 0.75 0.8 1. 0.57142857 0.83333333 0.66666667 0.75 ] mean value: 0.7521428571428571 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.83 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.0870738 0.08674216 0.08633971 0.0796845 0.08702898 0.08723402 0.0800159 0.08267999 0.08336306 0.0805757 ] mean value: 0.08407378196716309 key: score_time value: [0.01838231 0.01821375 0.01814437 0.01772857 0.01825523 0.01835394 0.01846385 0.01686049 0.01691628 0.0185349 ] mean value: 0.01798536777496338 key: test_mcc value: [0.63245553 0.40824829 0.625 1. 0.40824829 0.83666003 0.35675303 0.68313005 0.50709255 0. ] mean value: 0.5457587777402898 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.83333333 0.75 0.83333333 1. 0.75 0.91666667 0.66666667 0.83333333 0.75 0.63636364] mean value: 0.796969696969697 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.57142857 0.75 1. 0.57142857 0.88888889 0.33333333 0.75 0.72727273 0. ] mean value: 0.625901875901876 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.66666667 0.75 1. 0.66666667 0.8 1. 1. 0.66666667 0. ] mean value: 0.755 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.5 0.5 0.75 1. 0.5 1. 0.2 0.6 0.8 0. ] mean value: 0.585 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.75 0.6875 0.8125 1. 0.6875 0.9375 0.6 0.8 0.75714286 0.5 ] mean value: 0.7532142857142857 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.4 0.6 1. 0.4 0.8 0.2 0.6 0.57142857 0. ] mean value: 0.5071428571428571 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.16 Accuracy on Blind test: 0.77 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00703287 0.00692582 0.00732732 0.00697088 0.00684643 0.00690222 0.00689054 0.00692654 0.00703335 0.00678849] mean value: 0.006964445114135742 key: score_time value: [0.00811267 0.00805044 0.00872636 0.00804448 0.00804257 0.00807214 0.0084269 0.00811172 0.00793386 0.00806904] mean value: 0.008159017562866211 key: test_mcc value: [ 0.63245553 0.63245553 0.25 0. 0.625 0.15811388 0.47809144 0.29277002 -0.23904572 0.38575837] mean value: 0.3215599065732439 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.83333333 0.83333333 0.66666667 0.58333333 0.83333333 0.66666667 0.75 0.66666667 0.41666667 0.72727273] mean value: 0.6977272727272728 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.66666667 0.5 0.28571429 0.75 0.33333333 0.66666667 0.5 0.22222222 0.57142857] mean value: 0.5162698412698412 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 1. 0.5 0.33333333 0.75 0.5 0.75 0.66666667 0.25 0.66666667] mean value: 0.6416666666666666 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.5 0.5 0.5 0.25 0.75 0.25 0.6 0.4 0.2 0.5 ] mean value: 0.445 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.75 0.75 0.625 0.5 0.8125 0.5625 0.72857143 0.62857143 0.38571429 0.67857143] mean value: 0.6421428571428571 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.5 0.33333333 0.16666667 0.6 0.2 0.5 0.33333333 0.125 0.4 ] mean value: 0.36583333333333334 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.05 Accuracy on Blind test: 0.6 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [0.9999063 0.96824765 0.96530747 1.01568055 1.00144243 1.0073843 1.0299902 0.9734323 0.96623063 0.96925235] mean value: 0.9896874189376831 key: score_time value: [0.08913732 0.08848977 0.09147906 0.0936265 0.09639764 0.09607625 0.08916879 0.08925462 0.08908725 0.08973861] mean value: 0.09124557971954346 key: test_mcc value: [1. 0.625 0.625 1. 0.40824829 1. 0.83666003 0.65714286 0.65714286 0.81009259] mean value: 0.7619286618584635 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.83333333 0.83333333 1. 0.75 1. 0.91666667 0.83333333 0.83333333 0.90909091] mean value: 0.8909090909090909 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.75 0.75 1. 0.57142857 1. 0.88888889 0.8 0.8 0.85714286] mean value: 0.8417460317460318 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.75 0.75 1. 0.66666667 1. 1. 0.8 0.8 1. ] mean value: 0.8766666666666667 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.75 0.75 1. 0.5 1. 0.8 0.8 0.8 0.75] mean value: 0.8150000000000001 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.8125 0.8125 1. 0.6875 1. 0.9 0.82857143 0.82857143 0.875 ] mean value: 0.8744642857142857 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.6 0.6 1. 0.4 1. 0.8 0.66666667 0.66666667 0.75 ] mean value: 0.7483333333333333 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.13 Accuracy on Blind test: 0.86 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) key: fit_time value: [1.706635 0.86073232 0.898561 0.83530664 0.94017434 0.93120837 0.82625723 0.85147214 0.83630848 0.80121708] mean value: 0.948787260055542 key: score_time value: [0.23586416 0.21526957 0.2305944 0.21968794 0.23709798 0.14258313 0.17830396 0.22208166 0.23773837 0.23715353] mean value: 0.215637469291687 key: test_mcc value: [0.81649658 0.625 0.625 0.81649658 0.63245553 1. 0.52915026 0.47809144 0.68313005 0.81009259] mean value: 0.701591303820076 key: train_mcc value: [0.94025192 0.9600061 0.94025192 0.9600061 0.9600061 0.9600061 0.95952175 0.95952175 0.97968078 0.94053994] mean value: 0.9559792483395588 key: test_accuracy value: [0.91666667 0.83333333 0.83333333 0.91666667 0.83333333 1. 0.75 0.75 0.83333333 0.90909091] mean value: 0.8575757575757575 key: train_accuracy value: [0.97196262 0.98130841 0.97196262 0.98130841 0.98130841 0.98130841 0.98130841 0.98130841 0.99065421 0.97222222] mean value: 0.9794652128764278 key: test_fscore value: [0.85714286 0.75 0.75 0.85714286 0.66666667 1. 0.57142857 0.66666667 0.75 0.85714286] mean value: 0.7726190476190475 key: train_fscore value: [0.96 0.97368421 0.96 0.97368421 0.97368421 0.97368421 0.97297297 0.97297297 0.98666667 0.96 ] mean value: 0.9707349454717876 key: test_precision value: [1. 0.75 0.75 1. 1. 1. 1. 0.75 1. 1. ] mean value: 0.925 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 0.75 0.75 0.75 0.5 1. 0.4 0.6 0.6 0.75] mean value: 0.685 key: train_recall value: [0.92307692 0.94871795 0.92307692 0.94871795 0.94871795 0.94871795 0.94736842 0.94736842 0.97368421 0.92307692] mean value: 0.9432523616734143 key: test_roc_auc value: [0.875 0.8125 0.8125 0.875 0.75 1. 0.7 0.72857143 0.8 0.875 ] mean value: 0.8228571428571428 key: train_roc_auc value: [0.96153846 0.97435897 0.96153846 0.97435897 0.97435897 0.97435897 0.97368421 0.97368421 0.98684211 0.96153846] mean value: 0.9716261808367072 key: test_jcc value: [0.75 0.6 0.6 0.75 0.5 1. 0.4 0.5 0.6 0.75] mean value: 0.645 key: train_jcc value: [0.92307692 0.94871795 0.92307692 0.94871795 0.94871795 0.94871795 0.94736842 0.94736842 0.97368421 0.92307692] mean value: 0.9432523616734143 MCC on Blind test: 0.14 Accuracy on Blind test: 0.87 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01675677 0.00714898 0.0082562 0.0074234 0.00729084 0.00770926 0.00744081 0.00784683 0.00782228 0.00789189] mean value: 0.00855872631072998 key: score_time value: [0.01316333 0.00809813 0.0098176 0.00799656 0.00866699 0.0089457 0.00823903 0.00895739 0.00873828 0.0089376 ] mean value: 0.009156060218811036 key: test_mcc value: [ 0. 0.25 -0.23904572 0.47809144 0.40824829 0. -0.09759001 0.52915026 0.31428571 0.38575837] mean value: 0.20288983564397506 key: train_mcc value: [0.4754902 0.50673892 0.4653488 0.44239297 0.48817818 0.50337256 0.39242808 0.39534618 0.48161946 0.37522992] mean value: 0.4526145268371993 key: test_accuracy value: [0.66666667 0.66666667 0.41666667 0.75 0.75 0.41666667 0.5 0.75 0.66666667 0.72727273] mean value: 0.6310606060606061 key: train_accuracy value: [0.75700935 0.77570093 0.75700935 0.74766355 0.76635514 0.77570093 0.72897196 0.71962617 0.76635514 0.72222222] mean value: 0.7516614745586708 key: test_fscore value: [0. 0.5 0.22222222 0.66666667 0.57142857 0.46153846 0.25 0.57142857 0.6 0.57142857] mean value: 0.4414713064713065 key: train_fscore value: [0.66666667 0.67567568 0.64864865 0.63013699 0.66666667 0.66666667 0.5915493 0.61538462 0.65753425 0.57142857] mean value: 0.6390358039788872 key: test_precision value: [0. 0.5 0.2 0.6 0.66666667 0.33333333 0.33333333 1. 0.6 0.66666667] mean value: 0.49 key: train_precision value: [0.66666667 0.71428571 0.68571429 0.67647059 0.69444444 0.72727273 0.63636364 0.6 0.68571429 0.64516129] mean value: 0.6732093639019635 key: test_recall value: [0. 0.5 0.25 0.75 0.5 0.75 0.2 0.4 0.6 0.5 ] mean value: 0.445 key: train_recall value: [0.66666667 0.64102564 0.61538462 0.58974359 0.64102564 0.61538462 0.55263158 0.63157895 0.63157895 0.51282051] mean value: 0.6097840755735493 key: test_roc_auc value: [0.5 0.625 0.375 0.75 0.6875 0.5 0.45714286 0.7 0.65714286 0.67857143] mean value: 0.5930357142857143 key: train_roc_auc value: [0.7377451 0.74698341 0.72680995 0.71398944 0.73963047 0.74151584 0.68935927 0.69984744 0.73607933 0.67670011] mean value: 0.7208660360817448 key: test_jcc value: [0. 0.33333333 0.125 0.5 0.4 0.3 0.14285714 0.4 0.42857143 0.4 ] mean value: 0.30297619047619045 key: train_jcc value: [0.5 0.51020408 0.48 0.46 0.5 0.5 0.42 0.44444444 0.48979592 0.4 ] mean value: 0.47044444444444444 MCC on Blind test: 0.14 Accuracy on Blind test: 0.73 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.07072282 0.03617072 0.03579974 0.03728676 0.03555346 0.03524041 0.03362727 0.03646469 0.05544496 0.09033132] mean value: 0.04666421413421631 key: score_time value: [0.01112819 0.0112555 0.01102757 0.01083326 0.01092458 0.01118302 0.01061773 0.01085544 0.00988364 0.0103364 ] mean value: 0.010804533958435059 key: test_mcc value: [1. 0.625 0.81649658 0.81649658 0.83666003 1. 0.65714286 1. 0.65714286 0.81009259] mean value: 0.8219031489976224 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.83333333 0.91666667 0.91666667 0.91666667 1. 0.83333333 1. 0.83333333 0.90909091] mean value: 0.9159090909090909 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.75 0.85714286 0.85714286 0.88888889 1. 0.8 1. 0.8 0.85714286] mean value: 0.881031746031746 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.75 1. 1. 0.8 1. 0.8 1. 0.8 1. ] mean value: 0.915 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.75 0.75 0.75 1. 1. 0.8 1. 0.8 0.75] mean value: 0.86 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.8125 0.875 0.875 0.9375 1. 0.82857143 1. 0.82857143 0.875 ] mean value: 0.9032142857142857 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.6 0.75 0.75 0.8 1. 0.66666667 1. 0.66666667 0.75 ] mean value: 0.7983333333333333 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.12 Accuracy on Blind test: 0.84 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.01759219 0.01122999 0.01159906 0.01160884 0.011446 0.0116322 0.01226497 0.01147413 0.01333094 0.01733923] mean value: 0.01295175552368164 key: score_time value: [0.01068926 0.01071072 0.01065826 0.01099992 0.01074338 0.01079488 0.01092148 0.01075888 0.01078367 0.01083136] mean value: 0.010789179801940918 key: test_mcc value: [ 0.625 0.25 0.35355339 0.83666003 0.83666003 0.83666003 0.65714286 1. 0.71428571 -0.17857143] mean value: 0.5931390613052643 key: train_mcc value: [0.90236159 0.96085507 0.96085507 0.90236159 0.93999796 0.92091277 0.92008523 0.92008523 0.92008523 0.96106604] mean value: 0.9308665771065557 key: test_accuracy value: [0.83333333 0.66666667 0.66666667 0.91666667 0.91666667 0.91666667 0.83333333 1. 0.83333333 0.45454545] mean value: 0.8037878787878788 key: train_accuracy value: [0.95327103 0.98130841 0.98130841 0.95327103 0.97196262 0.96261682 0.96261682 0.96261682 0.96261682 0.98148148] mean value: 0.9673070266528211 key: test_fscore value: [0.75 0.5 0.6 0.88888889 0.88888889 0.88888889 0.8 1. 0.83333333 0.25 ] mean value: 0.74 key: train_fscore value: [0.9382716 0.975 0.975 0.9382716 0.96202532 0.95 0.94871795 0.94871795 0.94871795 0.975 ] mean value: 0.9559722372486086 key: test_precision value: [0.75 0.5 0.5 0.8 0.8 0.8 0.8 1. 0.71428571 0.25 ] mean value: 0.6914285714285715 key: train_precision value: [0.9047619 0.95121951 0.95121951 0.9047619 0.95 0.92682927 0.925 0.925 0.925 0.95121951] mean value: 0.9315011614401858 key: test_recall value: [0.75 0.5 0.75 1. 1. 1. 0.8 1. 1. 0.25] mean value: 0.805 key: train_recall value: [0.97435897 1. 1. 0.97435897 0.97435897 0.97435897 0.97368421 0.97368421 0.97368421 1. ] mean value: 0.9818488529014845 key: test_roc_auc value: [0.8125 0.625 0.6875 0.9375 0.9375 0.9375 0.82857143 1. 0.85714286 0.41071429] mean value: 0.8033928571428571 key: train_roc_auc value: [0.95776772 0.98529412 0.98529412 0.95776772 0.9724736 0.96512066 0.96510297 0.96510297 0.96510297 0.98550725] mean value: 0.9704534119579886 key: test_jcc value: [0.6 0.33333333 0.42857143 0.8 0.8 0.8 0.66666667 1. 0.71428571 0.14285714] mean value: 0.6285714285714286 key: train_jcc value: [0.88372093 0.95121951 0.95121951 0.88372093 0.92682927 0.9047619 0.90243902 0.90243902 0.90243902 0.95121951] mean value: 0.9160008643275801 MCC on Blind test: 0.07 Accuracy on Blind test: 0.69 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02494788 0.01585054 0.00776625 0.00726771 0.00712013 0.00683975 0.00691533 0.00710177 0.00688958 0.00703335] mean value: 0.00977323055267334 key: score_time value: [0.01840568 0.0093627 0.00875354 0.00817943 0.00809598 0.00807095 0.00832725 0.00807333 0.00806046 0.00866556] mean value: 0.009399485588073731 key: test_mcc value: [0.42640143 0.40824829 0.11952286 0.47809144 0.15811388 0.35355339 0.35675303 0.29277002 0.47809144 0.41833001] mean value: 0.3489875814335667 key: train_mcc value: [0.45416735 0.52159509 0.45416735 0.49964579 0.52383566 0.43117964 0.47315489 0.49023798 0.44470372 0.45631672] mean value: 0.4749004183145091 key: test_accuracy value: [0.75 0.75 0.58333333 0.75 0.66666667 0.66666667 0.66666667 0.66666667 0.75 0.72727273] mean value: 0.6977272727272728 key: train_accuracy value: [0.75700935 0.78504673 0.75700935 0.77570093 0.78504673 0.74766355 0.76635514 0.77570093 0.75700935 0.75925926] mean value: 0.7665801315334025 key: test_fscore value: [0.4 0.57142857 0.44444444 0.66666667 0.33333333 0.6 0.33333333 0.5 0.66666667 0.4 ] mean value: 0.4915873015873016 key: train_fscore value: [0.60606061 0.66666667 0.60606061 0.625 0.63492063 0.58461538 0.63768116 0.625 0.59375 0.60606061] mean value: 0.6185815663804795 key: test_precision value: [1. 0.66666667 0.4 0.6 0.5 0.5 1. 0.66666667 0.75 1. ] mean value: 0.7083333333333334 key: train_precision value: [0.74074074 0.76666667 0.74074074 0.8 0.83333333 0.73076923 0.70967742 0.76923077 0.73076923 0.74074074] mean value: 0.7562668872346292 key: test_recall value: [0.25 0.5 0.5 0.75 0.25 0.75 0.2 0.4 0.6 0.25] mean value: 0.445 key: train_recall value: [0.51282051 0.58974359 0.51282051 0.51282051 0.51282051 0.48717949 0.57894737 0.52631579 0.5 0.51282051] mean value: 0.5246288798920378 key: test_roc_auc value: [0.625 0.6875 0.5625 0.75 0.5625 0.6875 0.6 0.62857143 0.72857143 0.625 ] mean value: 0.6457142857142857 key: train_roc_auc value: [0.70493967 0.74340121 0.70493967 0.71964555 0.72699849 0.69211916 0.72425629 0.71967963 0.69927536 0.70568562] mean value: 0.7140940648394546 key: test_jcc value: [0.25 0.4 0.28571429 0.5 0.2 0.42857143 0.2 0.33333333 0.5 0.25 ] mean value: 0.33476190476190476 key: train_jcc value: [0.43478261 0.5 0.43478261 0.45454545 0.46511628 0.41304348 0.46808511 0.45454545 0.42222222 0.43478261] mean value: 0.44819058211137036 MCC on Blind test: 0.14 Accuracy on Blind test: 0.75 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00787282 0.00709176 0.00747252 0.00727296 0.00753498 0.0074842 0.00761414 0.00750947 0.00763249 0.00774121] mean value: 0.00752265453338623 key: score_time value: [0.00790644 0.00816011 0.00783634 0.00811148 0.00789452 0.00806546 0.00857043 0.00817347 0.00801802 0.00889587] mean value: 0.008163213729858398 key: test_mcc value: [1. 0.625 0.11952286 0.70710678 0.47809144 0.70710678 0.37142857 0.84515425 0.29277002 0.60714286] mean value: 0.5753323572224797 key: train_mcc value: [0.8165399 0.85945065 0.82420912 0.82726738 0.76153359 0.83287099 0.79235477 0.84830731 0.84110073 0.8789655 ] mean value: 0.8282599941345357 key: test_accuracy value: [1. 0.83333333 0.58333333 0.83333333 0.75 0.83333333 0.66666667 0.91666667 0.66666667 0.81818182] mean value: 0.7901515151515152 key: train_accuracy value: [0.90654206 0.93457944 0.91588785 0.91588785 0.87850467 0.91588785 0.88785047 0.92523364 0.92523364 0.94444444] mean value: 0.9150051921079958 key: test_fscore value: [1. 0.75 0.44444444 0.8 0.66666667 0.8 0.66666667 0.90909091 0.5 0.75 ] mean value: 0.7286868686868687 key: train_fscore value: [0.88372093 0.90410959 0.86956522 0.89156627 0.85057471 0.89411765 0.86363636 0.90243902 0.88235294 0.92105263] mean value: 0.8863135322209726 key: test_precision value: [1. 0.75 0.4 0.66666667 0.6 0.66666667 0.57142857 0.83333333 0.66666667 0.75 ] mean value: 0.6904761904761905 key: train_precision value: [0.80851064 0.97058824 1. 0.84090909 0.77083333 0.82608696 0.76 0.84090909 1. 0.94594595] mean value: 0.876378329121119 key: test_recall value: [1. 0.75 0.5 1. 0.75 1. 0.8 1. 0.4 0.75] mean value: 0.795 key: train_recall value: [0.97435897 0.84615385 0.76923077 0.94871795 0.94871795 0.97435897 1. 0.97368421 0.78947368 0.8974359 ] mean value: 0.9122132253711202 key: test_roc_auc value: [1. 0.8125 0.5625 0.875 0.75 0.875 0.68571429 0.92857143 0.62857143 0.80357143] mean value: 0.7921428571428571 key: train_roc_auc value: [0.92100302 0.91572398 0.88461538 0.92288839 0.89347662 0.92835596 0.91304348 0.93611747 0.89473684 0.9342252 ] mean value: 0.9144186331459181 key: test_jcc value: [1. 0.6 0.28571429 0.66666667 0.5 0.66666667 0.5 0.83333333 0.33333333 0.6 ] mean value: 0.5985714285714285 key: train_jcc value: [0.79166667 0.825 0.76923077 0.80434783 0.74 0.80851064 0.76 0.82222222 0.78947368 0.85365854] mean value: 0.7964110343300379 MCC on Blind test: 0.04 Accuracy on Blind test: 0.82 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00992465 0.00924659 0.00704741 0.00705194 0.00760007 0.00777459 0.00699854 0.00773525 0.00748038 0.00777173] mean value: 0.007863116264343262 key: score_time value: [0.01014447 0.00912404 0.00799847 0.00816655 0.00813746 0.00799036 0.00796533 0.00817347 0.00830007 0.00831962] mean value: 0.00843198299407959 key: test_mcc value: [1. 0.40824829 0.40824829 0.625 0.81649658 0.625 0.83666003 0.65714286 0.23904572 0.41833001] mean value: 0.6034171780666301 key: train_mcc value: [0.8720951 0.89986237 0.74811148 0.77945561 0.86259524 0.6717753 0.93862091 0.88019137 0.69504805 0.78691217] mean value: 0.8134667600062208 key: test_accuracy value: [1. 0.75 0.75 0.83333333 0.91666667 0.83333333 0.91666667 0.83333333 0.58333333 0.72727273] mean value: 0.8143939393939394 key: train_accuracy value: [0.93457944 0.95327103 0.87850467 0.89719626 0.93457944 0.8411215 0.97196262 0.94392523 0.82242991 0.89814815] mean value: 0.9075718241606092 key: test_fscore value: [1. 0.57142857 0.57142857 0.75 0.85714286 0.75 0.88888889 0.8 0.61538462 0.4 ] mean value: 0.7204273504273505 key: train_fscore value: [0.91764706 0.93670886 0.8 0.86075949 0.91358025 0.72131148 0.96 0.91428571 0.8 0.8358209 ] mean value: 0.8660113745385427 key: test_precision value: [1. 0.66666667 0.66666667 0.75 1. 0.75 1. 0.8 0.5 1. ] mean value: 0.8133333333333334 key: train_precision value: [0.84782609 0.925 1. 0.85 0.88095238 1. 0.97297297 1. 0.66666667 1. ] mean value: 0.9143418107548542 key: test_recall value: [1. 0.5 0.5 0.75 0.75 0.75 0.8 0.8 0.8 0.25] mean value: 0.6900000000000001 key: train_recall value: [1. 0.94871795 0.66666667 0.87179487 0.94871795 0.56410256 0.94736842 0.84210526 1. 0.71794872] mean value: 0.8507422402159244 key: test_roc_auc value: [1. 0.6875 0.6875 0.8125 0.875 0.8125 0.9 0.82857143 0.61428571 0.625 ] mean value: 0.7842857142857143 key: train_roc_auc value: [0.94852941 0.95230015 0.83333333 0.89177979 0.93759427 0.78205128 0.96643783 0.92105263 0.86231884 0.85897436] mean value: 0.8954371900141855 key: test_jcc value: [1. 0.4 0.4 0.6 0.75 0.6 0.8 0.66666667 0.44444444 0.25 ] mean value: 0.5911111111111111 key: train_jcc value: [0.84782609 0.88095238 0.66666667 0.75555556 0.84090909 0.56410256 0.92307692 0.84210526 0.66666667 0.71794872] mean value: 0.7705809915992983 MCC on Blind test: 0.07 Accuracy on Blind test: 0.91 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.07426667 0.06154084 0.06283951 0.06345463 0.06271338 0.06455684 0.06105089 0.06551576 0.06561875 0.06309128] mean value: 0.0644648551940918 key: score_time value: [0.01463723 0.01418447 0.01481771 0.01499844 0.01489115 0.01537299 0.01565957 0.01581383 0.01552248 0.01574159] mean value: 0.015163946151733398 key: test_mcc value: [1. 0.625 0.625 0.625 0.83666003 1. 0.52915026 1. 0.65714286 0.81009259] mean value: 0.7708045733190834 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.83333333 0.83333333 0.83333333 0.91666667 1. 0.75 1. 0.83333333 0.90909091] mean value: 0.8909090909090909 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.75 0.75 0.75 0.88888889 1. 0.57142857 1. 0.8 0.85714286] mean value: 0.8367460317460318 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.75 0.75 0.75 0.8 1. 1. 1. 0.8 1. ] mean value: 0.885 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.75 0.75 0.75 1. 1. 0.4 1. 0.8 0.75] mean value: 0.8200000000000001 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.8125 0.8125 0.8125 0.9375 1. 0.7 1. 0.82857143 0.875 ] mean value: 0.8778571428571429 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.6 0.6 0.6 0.8 1. 0.4 1. 0.66666667 0.75 ] mean value: 0.7416666666666667 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.09 Accuracy on Blind test: 0.78 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.02740383 0.02774906 0.03296709 0.04285073 0.033988 0.02595377 0.04620218 0.03556585 0.02733755 0.03265309] mean value: 0.03326711654663086 key: score_time value: [0.02362061 0.02275753 0.03781056 0.03313112 0.02986407 0.02139044 0.03054595 0.02594328 0.02176881 0.02311182] mean value: 0.02699441909790039 key: test_mcc value: [0.83666003 0.625 0.81649658 1. 0.83666003 1. 0.83666003 1. 0.65714286 0.81009259] mean value: 0.8418712104973792 key: train_mcc value: [1. 0.97991726 1. 0.97991726 0.97991726 1. 1. 1. 1. 0.98002018] mean value: 0.9919771953521386 key: test_accuracy value: [0.91666667 0.83333333 0.91666667 1. 0.91666667 1. 0.91666667 1. 0.83333333 0.90909091] mean value: 0.9242424242424242 key: train_accuracy value: [1. 0.99065421 1. 0.99065421 0.99065421 1. 1. 1. 1. 0.99074074] mean value: 0.9962703357563171 key: test_fscore value: [0.88888889 0.75 0.85714286 1. 0.88888889 1. 0.88888889 1. 0.8 0.85714286] mean value: 0.8930952380952382 key: train_fscore value: [1. 0.98701299 1. 0.98701299 0.98701299 1. 1. 1. 1. 0.98701299] mean value: 0.9948051948051948 key: test_precision value: [0.8 0.75 1. 1. 0.8 1. 1. 1. 0.8 1. ] mean value: 0.915 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.75 0.75 1. 1. 1. 0.8 1. 0.8 0.75] mean value: 0.885 key: train_recall value: [1. 0.97435897 1. 0.97435897 0.97435897 1. 1. 1. 1. 0.97435897] mean value: 0.9897435897435898 key: test_roc_auc value: [0.9375 0.8125 0.875 1. 0.9375 1. 0.9 1. 0.82857143 0.875 ] mean value: 0.9166071428571428 key: train_roc_auc value: [1. 0.98717949 1. 0.98717949 0.98717949 1. 1. 1. 1. 0.98717949] mean value: 0.9948717948717949 key: test_jcc value: [0.8 0.6 0.75 1. 0.8 1. 0.8 1. 0.66666667 0.75 ] mean value: 0.8166666666666667 key: train_jcc value: [1. 0.97435897 1. 0.97435897 0.97435897 1. 1. 1. 1. 0.97435897] mean value: 0.9897435897435898 MCC on Blind test: 0.13 Accuracy on Blind test: 0.86 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.02972174 0.03583384 0.03609109 0.03561258 0.0359695 0.03587818 0.03609157 0.03597355 0.03254533 0.03658676] mean value: 0.0350304126739502 key: score_time value: [0.02094769 0.02031541 0.02000165 0.02015448 0.01970243 0.01941609 0.01103115 0.02209592 0.0211103 0.02550483] mean value: 0.020027995109558105 key: test_mcc value: [0.42640143 0.15811388 0.40824829 0.63245553 0.15811388 0.40824829 0.35675303 0.07559289 0.11952286 0. ] mean value: 0.27434501012310836 key: train_mcc value: [0.94025192 0.94025192 0.97991726 0.92064018 0.92064018 0.92064018 0.93950808 0.93950808 0.93950808 0.94053994] mean value: 0.9381405840047681 key: test_accuracy value: [0.75 0.66666667 0.75 0.83333333 0.66666667 0.75 0.66666667 0.58333333 0.58333333 0.63636364] mean value: 0.6886363636363636 key: train_accuracy value: [0.97196262 0.97196262 0.99065421 0.96261682 0.96261682 0.96261682 0.97196262 0.97196262 0.97196262 0.97222222] mean value: 0.9710539979231568 key: test_fscore value: [0.4 0.33333333 0.57142857 0.66666667 0.33333333 0.57142857 0.33333333 0.28571429 0.44444444 0. ] mean value: 0.39396825396825397 key: train_fscore value: [0.96 0.96 0.98701299 0.94594595 0.94594595 0.94594595 0.95890411 0.95890411 0.95890411 0.96 ] mean value: 0.9581563153617948 key: test_precision value: [1. 0.5 0.66666667 1. 0.5 0.66666667 1. 0.5 0.5 0. ] mean value: 0.6333333333333333 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.25 0.25 0.5 0.5 0.25 0.5 0.2 0.2 0.4 0. ] mean value: 0.305 key: train_recall value: [0.92307692 0.92307692 0.97435897 0.8974359 0.8974359 0.8974359 0.92105263 0.92105263 0.92105263 0.92307692] mean value: 0.9199055330634278 key: test_roc_auc value: [0.625 0.5625 0.6875 0.75 0.5625 0.6875 0.6 0.52857143 0.55714286 0.5 ] mean value: 0.6060714285714286 key: train_roc_auc value: [0.96153846 0.96153846 0.98717949 0.94871795 0.94871795 0.94871795 0.96052632 0.96052632 0.96052632 0.96153846] mean value: 0.9599527665317139 key: test_jcc value: [0.25 0.2 0.4 0.5 0.2 0.4 0.2 0.16666667 0.28571429 0. ] mean value: 0.26023809523809527 key: train_jcc value: [0.92307692 0.92307692 0.97435897 0.8974359 0.8974359 0.8974359 0.92105263 0.92105263 0.92105263 0.92307692] mean value: 0.9199055330634278 MCC on Blind test: 0.05 Accuracy on Blind test: 0.85 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.08681178 0.08099294 0.0833962 0.08435941 0.08145213 0.08759737 0.08979297 0.08592725 0.08576846 0.07664156] mean value: 0.08427400588989258 key: score_time value: [0.00886655 0.00919652 0.00933671 0.00866079 0.00926757 0.00908661 0.00926757 0.0088954 0.00944066 0.0094223 ] mean value: 0.009144067764282227 key: test_mcc value: [0.83666003 0.625 0.81649658 0.81649658 0.83666003 0.83666003 0.65714286 0.84515425 0.65714286 0.81009259] mean value: 0.7737505797772892 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.91666667 0.83333333 0.91666667 0.91666667 0.91666667 0.91666667 0.83333333 0.91666667 0.83333333 0.90909091] mean value: 0.8909090909090909 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.88888889 0.75 0.85714286 0.85714286 0.88888889 0.88888889 0.8 0.90909091 0.8 0.85714286] mean value: 0.8497186147186148 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8 0.75 1. 1. 0.8 0.8 0.8 0.83333333 0.8 1. ] mean value: 0.8583333333333334 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.75 0.75 0.75 1. 1. 0.8 1. 0.8 0.75] mean value: 0.86 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9375 0.8125 0.875 0.875 0.9375 0.9375 0.82857143 0.92857143 0.82857143 0.875 ] mean value: 0.8835714285714286 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.8 0.6 0.75 0.75 0.8 0.8 0.66666667 0.83333333 0.66666667 0.75 ] mean value: 0.7416666666666667 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.82 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.00962162 0.01063228 0.01091313 0.01079535 0.01196051 0.0170722 0.01151824 0.01096225 0.02570939 0.01203942] mean value: 0.01312243938446045 key: score_time value: [0.01107264 0.01098132 0.01062155 0.01118159 0.01129007 0.01117349 0.01122856 0.01094556 0.01141524 0.01125026] mean value: 0.01111602783203125 key: test_mcc value: [0. 0. 0. 0. 0. 0. 0. 0.07559289 0. 0. ] mean value: 0.007559289460184544 key: train_mcc value: [0.32183783 0.32183783 0.32183783 0.18223949 0.26021572 0.26021572 0.32843368 0.32843368 0.29834424 0.29306141] mean value: 0.2916457442021758 key: test_accuracy value: [0.66666667 0.66666667 0.66666667 0.66666667 0.66666667 0.66666667 0.58333333 0.58333333 0.58333333 0.63636364] mean value: 0.6386363636363637 key: train_accuracy value: [0.69158879 0.69158879 0.69158879 0.65420561 0.6728972 0.6728972 0.70093458 0.70093458 0.69158879 0.68518519] mean value: 0.6853409484250605 key: test_fscore value: [0. 0. 0. 0. 0. 0. 0. 0.28571429 0. 0. ] mean value: 0.028571428571428574 key: train_fscore value: [0.26666667 0.26666667 0.26666667 0.09756098 0.18604651 0.18604651 0.27272727 0.27272727 0.23255814 0.22727273] mean value: 0.22749394111277266 key: test_precision value: [0. 0. 0. 0. 0. 0. 0. 0.5 0. 0. ] mean value: 0.05 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0. 0. 0. 0. 0. 0. 0. 0.2 0. 0. ] mean value: 0.02 key: train_recall value: [0.15384615 0.15384615 0.15384615 0.05128205 0.1025641 0.1025641 0.15789474 0.15789474 0.13157895 0.12820513] mean value: 0.12935222672064778 key: test_roc_auc value: [0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.52857143 0.5 0.5 ] mean value: 0.5028571428571429 key: train_roc_auc value: [0.57692308 0.57692308 0.57692308 0.52564103 0.55128205 0.55128205 0.57894737 0.57894737 0.56578947 0.56410256] mean value: 0.5646761133603239 key: test_jcc value: [0. 0. 0. 0. 0. 0. 0. 0.16666667 0. 0. ] mean value: 0.016666666666666666 key: train_jcc value: [0.15384615 0.15384615 0.15384615 0.05128205 0.1025641 0.1025641 0.15789474 0.15789474 0.13157895 0.12820513] mean value: 0.12935222672064778 MCC on Blind test: -0.02 Accuracy on Blind test: 0.95 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.0105691 0.01015902 0.00814056 0.00781894 0.00773811 0.00766277 0.00809789 0.0085144 0.00822377 0.008322 ] mean value: 0.008524656295776367 key: score_time value: [0.01082993 0.00936484 0.00863528 0.00822663 0.00833321 0.00819302 0.0086484 0.00855327 0.00863814 0.00825906] mean value: 0.008768177032470703 key: test_mcc value: [0.63245553 0.40824829 0.35355339 1. 0.625 0.70710678 0.68313005 0.83666003 0.31428571 0.69006556] mean value: 0.6250505345503478 key: train_mcc value: [0.79826546 0.8375252 0.89876312 0.85818605 0.85972678 0.87895928 0.81760898 0.83676583 0.89756105 0.83946488] mean value: 0.8522826622791292 key: test_accuracy value: [0.83333333 0.75 0.66666667 1. 0.83333333 0.83333333 0.83333333 0.91666667 0.66666667 0.81818182] mean value: 0.8151515151515152 key: train_accuracy value: [0.90654206 0.92523364 0.95327103 0.93457944 0.93457944 0.94392523 0.91588785 0.92523364 0.95327103 0.92592593] mean value: 0.9318449290411908 key: test_fscore value: [0.66666667 0.57142857 0.6 1. 0.75 0.8 0.75 0.88888889 0.6 0.8 ] mean value: 0.7426984126984127 key: train_fscore value: [0.87179487 0.89473684 0.93506494 0.90909091 0.91139241 0.92307692 0.88311688 0.89473684 0.93333333 0.8974359 ] mean value: 0.905377984218757 key: test_precision value: [1. 0.66666667 0.5 1. 0.75 0.66666667 1. 1. 0.6 0.66666667] mean value: 0.785 key: train_precision value: [0.87179487 0.91891892 0.94736842 0.92105263 0.9 0.92307692 0.87179487 0.89473684 0.94594595 0.8974359 ] mean value: 0.9092125323704271 key: test_recall value: [0.5 0.5 0.75 1. 0.75 1. 0.6 0.8 0.6 1. ] mean value: 0.75 key: train_recall value: [0.87179487 0.87179487 0.92307692 0.8974359 0.92307692 0.92307692 0.89473684 0.89473684 0.92105263 0.8974359 ] mean value: 0.9018218623481782 key: test_roc_auc value: [0.75 0.6875 0.6875 1. 0.8125 0.875 0.8 0.9 0.65714286 0.85714286] mean value: 0.8026785714285715 key: train_roc_auc value: [0.89913273 0.91383861 0.94683258 0.92665913 0.9321267 0.93947964 0.91113654 0.91838291 0.94603356 0.91973244] mean value: 0.9253354836037566 key: test_jcc value: [0.5 0.4 0.42857143 1. 0.6 0.66666667 0.6 0.8 0.42857143 0.66666667] mean value: 0.6090476190476191 key: train_jcc value: [0.77272727 0.80952381 0.87804878 0.83333333 0.8372093 0.85714286 0.79069767 0.80952381 0.875 0.81395349] mean value: 0.8277160327855166 MCC on Blind test: 0.08 Accuracy on Blind test: 0.74 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.07413816 0.06058931 0.06105781 0.06061363 0.0620904 0.06173635 0.06109071 0.06248355 0.06057048 0.06052804] mean value: 0.06248984336853027 key: score_time value: [0.00838947 0.00880098 0.00829411 0.0082829 0.00848746 0.00843048 0.00848484 0.00820589 0.00828862 0.00822687] mean value: 0.008389163017272949 key: test_mcc value: [0.63245553 0.40824829 0.35355339 1. 0.625 0.70710678 0.68313005 0.83666003 0.31428571 0.69006556] mean value: 0.6250505345503478 key: train_mcc value: [0.79826546 0.8375252 0.89876312 0.85818605 0.85972678 0.87895928 0.81760898 0.83676583 0.89756105 0.83946488] mean value: 0.8522826622791292 key: test_accuracy value: [0.83333333 0.75 0.66666667 1. 0.83333333 0.83333333 0.83333333 0.91666667 0.66666667 0.81818182] mean value: 0.8151515151515152 key: train_accuracy value: [0.90654206 0.92523364 0.95327103 0.93457944 0.93457944 0.94392523 0.91588785 0.92523364 0.95327103 0.92592593] mean value: 0.9318449290411908 key: test_fscore value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:122: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:125: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy baseline_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.66666667 0.57142857 0.6 1. 0.75 0.8 0.75 0.88888889 0.6 0.8 ] mean value: 0.7426984126984127 key: train_fscore value: [0.87179487 0.89473684 0.93506494 0.90909091 0.91139241 0.92307692 0.88311688 0.89473684 0.93333333 0.8974359 ] mean value: 0.905377984218757 key: test_precision value: [1. 0.66666667 0.5 1. 0.75 0.66666667 1. 1. 0.6 0.66666667] mean value: 0.785 key: train_precision value: [0.87179487 0.91891892 0.94736842 0.92105263 0.9 0.92307692 0.87179487 0.89473684 0.94594595 0.8974359 ] mean value: 0.9092125323704271 key: test_recall value: [0.5 0.5 0.75 1. 0.75 1. 0.6 0.8 0.6 1. ] mean value: 0.75 key: train_recall value: [0.87179487 0.87179487 0.92307692 0.8974359 0.92307692 0.92307692 0.89473684 0.89473684 0.92105263 0.8974359 ] mean value: 0.9018218623481782 key: test_roc_auc value: [0.75 0.6875 0.6875 1. 0.8125 0.875 0.8 0.9 0.65714286 0.85714286] mean value: 0.8026785714285715 key: train_roc_auc value: [0.89913273 0.91383861 0.94683258 0.92665913 0.9321267 0.93947964 0.91113654 0.91838291 0.94603356 0.91973244] mean value: 0.9253354836037566 key: test_jcc value: [0.5 0.4 0.42857143 1. 0.6 0.66666667 0.6 0.8 0.42857143 0.66666667] mean value: 0.6090476190476191 key: train_jcc value: [0.77272727 0.80952381 0.87804878 0.83333333 0.8372093 0.85714286 0.79069767 0.80952381 0.875 0.81395349] mean value: 0.8277160327855166 MCC on Blind test: 0.08 Accuracy on Blind test: 0.74 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.01721478 0.01214457 0.01244617 0.01377439 0.01370144 0.01362348 0.01294899 0.01230645 0.01298475 0.0133431 ] mean value: 0.013448810577392578 key: score_time value: [0.01065063 0.00836754 0.00845647 0.00828004 0.00818396 0.00835776 0.00841212 0.0085628 0.00873351 0.00875807] mean value: 0.008676290512084961 key: test_mcc value: [0.8819171 0.5 0.37796447 0.875 1. 0.60714286 0.76376262 1. 0.64465837 0.60714286] mean value: 0.7257588278029415 key: train_mcc value: [0.79411765 0.85331034 0.79599234 0.76678748 0.81031543 0.82480818 0.81031543 0.82480818 0.79688349 0.85400682] mean value: 0.8131345350406455 key: test_accuracy value: [0.9375 0.75 0.66666667 0.93333333 1. 0.8 0.86666667 1. 0.8 0.8 ] mean value: 0.8554166666666667 key: train_accuracy value: [0.89705882 0.92647059 0.89781022 0.88321168 0.90510949 0.91240876 0.90510949 0.91240876 0.89781022 0.9270073 ] mean value: 0.9064405324173466 key: test_fscore value: [0.93333333 0.75 0.70588235 0.93333333 1. 0.8 0.85714286 1. 0.84210526 0.8 ] mean value: 0.8621797139908595 key: train_fscore value: [0.89705882 0.92537313 0.89705882 0.88235294 0.90510949 0.91304348 0.90510949 0.91176471 0.89393939 0.92647059] mean value: 0.9057280866983752 key: test_precision value: [1. 0.75 0.6 0.875 1. 0.75 1. 1. 0.72727273 0.85714286] mean value: 0.8559415584415584 key: train_precision value: [0.89705882 0.93939394 0.91044776 0.89552239 0.91176471 0.91304348 0.89855072 0.91176471 0.921875 0.92647059] mean value: 0.9125892115075633 key: test_recall value: [0.875 0.75 0.85714286 1. 1. 0.85714286 0.75 1. 1. 0.75 ] mean value: 0.8839285714285714 key: train_recall value: [0.89705882 0.91176471 0.88405797 0.86956522 0.89855072 0.91304348 0.91176471 0.91176471 0.86764706 0.92647059] mean value: 0.8991687979539642 key: test_roc_auc value: [0.9375 0.75 0.67857143 0.9375 1. 0.80357143 0.875 1. 0.78571429 0.80357143] mean value: 0.8571428571428571 key: train_roc_auc value: [0.89705882 0.92647059 0.89791134 0.88331202 0.90515772 0.91240409 0.90515772 0.91240409 0.89759165 0.92700341] mean value: 0.9064471440750212 key: test_jcc value: [0.875 0.6 0.54545455 0.875 1. 0.66666667 0.75 1. 0.72727273 0.66666667] mean value: 0.7706060606060606 key: train_jcc value: [0.81333333 0.86111111 0.81333333 0.78947368 0.82666667 0.84 0.82666667 0.83783784 0.80821918 0.8630137 ] mean value: 0.8279655509871804 MCC on Blind test: 0.11 Accuracy on Blind test: 0.65 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.39265418 0.37149286 0.37123227 0.37943006 0.37867284 0.38540936 0.36591649 0.36693406 0.37429166 0.37283516] mean value: 0.3758868932723999 key: score_time value: [0.00942802 0.0091536 0.0086658 0.00927925 0.00937963 0.00886846 0.00882983 0.00927758 0.009166 0.00868988] mean value: 0.009073805809020997 key: test_mcc value: [0.8819171 0.62994079 0.49099025 0.76376262 0.73214286 0.60714286 0.6000992 1. 0.64465837 0.33928571] mean value: 0.6689939758834577 key: train_mcc value: [0.91215932 0.92657079 0.94201665 0.8978896 0.97080136 0.97122151 0.94201665 0.88320546 1. 0.95630861] mean value: 0.9402189943658086 key: test_accuracy value: [0.9375 0.8125 0.73333333 0.86666667 0.86666667 0.8 0.8 1. 0.8 0.66666667] mean value: 0.8283333333333334 key: train_accuracy value: [0.95588235 0.96323529 0.97080292 0.94890511 0.98540146 0.98540146 0.97080292 0.94160584 1. 0.97810219] mean value: 0.9700139544869043 key: test_fscore value: [0.93333333 0.82352941 0.75 0.875 0.85714286 0.8 0.82352941 1. 0.84210526 0.66666667] mean value: 0.8371306943830163 key: train_fscore value: [0.95652174 0.96350365 0.97058824 0.94964029 0.98550725 0.98529412 0.97101449 0.94117647 1. 0.97810219] mean value: 0.9701348428976124 key: test_precision value: [1. 0.77777778 0.66666667 0.77777778 0.85714286 0.75 0.77777778 1. 0.72727273 0.71428571] mean value: 0.8048701298701298 key: train_precision value: [0.94285714 0.95652174 0.98507463 0.94285714 0.98550725 1. 0.95714286 0.94117647 1. 0.97101449] mean value: 0.968215171857192 key: test_recall value: [0.875 0.875 0.85714286 1. 0.85714286 0.85714286 0.875 1. 1. 0.625 ] mean value: 0.8821428571428571 key: train_recall value: [0.97058824 0.97058824 0.95652174 0.95652174 0.98550725 0.97101449 0.98529412 0.94117647 1. 0.98529412] mean value: 0.9722506393861893 key: test_roc_auc value: [0.9375 0.8125 0.74107143 0.875 0.86607143 0.80357143 0.79464286 1. 0.78571429 0.66964286] mean value: 0.8285714285714286 key: train_roc_auc value: [0.95588235 0.96323529 0.97090793 0.9488491 0.98540068 0.98550725 0.97090793 0.94160273 1. 0.97815431] mean value: 0.9700447570332481 key: test_jcc value: [0.875 0.7 0.6 0.77777778 0.75 0.66666667 0.7 1. 0.72727273 0.5 ] mean value: 0.7296717171717172 key: train_jcc value: [0.91666667 0.92957746 0.94285714 0.90410959 0.97142857 0.97101449 0.94366197 0.88888889 1. 0.95714286] mean value: 0.9425347645398564 MCC on Blind test: 0.07 Accuracy on Blind test: 0.72 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.00969172 0.00904489 0.00695467 0.00679207 0.00658846 0.00662065 0.00658751 0.00682616 0.00665498 0.00697279] mean value: 0.007273387908935547 key: score_time value: [0.01047778 0.01015592 0.00812674 0.0078907 0.00783539 0.00783157 0.00781465 0.00791001 0.0078311 0.00797391] mean value: 0.008384776115417481 key: test_mcc value: [0.8819171 0.5 0.33928571 0.56407607 0.49099025 0.60714286 0.46428571 0.73214286 0.64465837 0.07142857] mean value: 0.5295927517042964 key: train_mcc value: [0.61098829 0.74337629 0.6462903 0.59999905 0.55137884 0.71313464 0.65613085 0.71021843 0.63063055 0.63867147] mean value: 0.6500818694571209 key: test_accuracy value: [0.9375 0.75 0.66666667 0.73333333 0.73333333 0.8 0.73333333 0.86666667 0.8 0.53333333] mean value: 0.7554166666666666 key: train_accuracy value: [0.80147059 0.86764706 0.81751825 0.79562044 0.76642336 0.8540146 0.81751825 0.84671533 0.81021898 0.81021898] mean value: 0.8187365822241305 key: test_fscore value: [0.94117647 0.75 0.66666667 0.77777778 0.75 0.8 0.75 0.875 0.84210526 0.53333333] mean value: 0.7686059511523908 key: train_fscore value: [0.81632653 0.87671233 0.83443709 0.81333333 0.79487179 0.84615385 0.83660131 0.82644628 0.82432432 0.82894737] mean value: 0.8298154200757712 key: test_precision value: [0.88888889 0.75 0.625 0.63636364 0.66666667 0.75 0.75 0.875 0.72727273 0.57142857] mean value: 0.7240620490620491 key: train_precision value: [0.75949367 0.82051282 0.76829268 0.75308642 0.71264368 0.90163934 0.75294118 0.94339623 0.7625 0.75 ] mean value: 0.7924506019387709 key: test_recall value: [1. 0.75 0.71428571 1. 0.85714286 0.85714286 0.75 0.875 1. 0.5 ] mean value: 0.8303571428571428 key: train_recall value: [0.88235294 0.94117647 0.91304348 0.88405797 0.89855072 0.79710145 0.94117647 0.73529412 0.89705882 0.92647059] mean value: 0.8816283034953112 key: test_roc_auc value: [0.9375 0.75 0.66964286 0.75 0.74107143 0.80357143 0.73214286 0.86607143 0.78571429 0.53571429] mean value: 0.7571428571428571 key: train_roc_auc value: [0.80147059 0.86764706 0.81681586 0.79497016 0.76545183 0.85443308 0.81841432 0.84590793 0.81084825 0.81106138] mean value: 0.8187020460358057 key: test_jcc value: [0.88888889 0.6 0.5 0.63636364 0.6 0.66666667 0.6 0.77777778 0.72727273 0.36363636] mean value: 0.636060606060606 key: train_jcc value: [0.68965517 0.7804878 0.71590909 0.68539326 0.65957447 0.73333333 0.71910112 0.70422535 0.70114943 0.70786517] mean value: 0.7096694197581203 MCC on Blind test: 0.03 Accuracy on Blind test: 0.49 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00759959 0.00745654 0.0069356 0.00687218 0.00700045 0.00688291 0.0068655 0.00682497 0.00688171 0.00696039] mean value: 0.007027983665466309 key: score_time value: [0.00797582 0.00790715 0.00785089 0.0078764 0.00810456 0.00792956 0.0079031 0.00789905 0.00793099 0.00804949] mean value: 0.007942700386047363 key: test_mcc value: [0.37796447 0.25819889 0.07142857 0.49099025 0.47245559 0.13363062 0.46428571 0.73214286 0.33928571 0.32732684] mean value: 0.36677095205019633 key: train_mcc value: [0.5008673 0.53311399 0.52059257 0.45151662 0.49006025 0.5360985 0.52559229 0.51215762 0.49197671 0.53517487] mean value: 0.5097150730382196 key: test_accuracy value: [0.6875 0.625 0.53333333 0.73333333 0.73333333 0.53333333 0.73333333 0.86666667 0.66666667 0.66666667] mean value: 0.6779166666666666 key: train_accuracy value: [0.75 0.76470588 0.75912409 0.72262774 0.74452555 0.76642336 0.75912409 0.75182482 0.74452555 0.76642336] mean value: 0.7529304422498926 key: test_fscore value: [0.70588235 0.57142857 0.53333333 0.75 0.66666667 0.63157895 0.75 0.875 0.66666667 0.70588235] mean value: 0.6856438891346012 key: train_fscore value: [0.75714286 0.77777778 0.77241379 0.74666667 0.75524476 0.78082192 0.7755102 0.77027027 0.75524476 0.77464789] mean value: 0.7665740884664326 key: test_precision value: [0.66666667 0.66666667 0.5 0.66666667 0.8 0.5 0.75 0.875 0.71428571 0.66666667] mean value: 0.680595238095238 key: train_precision value: [0.73611111 0.73684211 0.73684211 0.69135802 0.72972973 0.74025974 0.72151899 0.7125 0.72 0.74324324] mean value: 0.726840504690327 key: test_recall value: [0.75 0.5 0.57142857 0.85714286 0.57142857 0.85714286 0.75 0.875 0.625 0.75 ] mean value: 0.7107142857142857 key: train_recall value: [0.77941176 0.82352941 0.8115942 0.8115942 0.7826087 0.82608696 0.83823529 0.83823529 0.79411765 0.80882353] mean value: 0.8114236999147485 key: test_roc_auc value: [0.6875 0.625 0.53571429 0.74107143 0.72321429 0.55357143 0.73214286 0.86607143 0.66964286 0.66071429] mean value: 0.6794642857142857 key: train_roc_auc value: [0.75 0.76470588 0.75873828 0.72197357 0.74424552 0.76598465 0.75969736 0.75245098 0.74488491 0.76673061] mean value: 0.7529411764705882 key: test_jcc value: [0.54545455 0.4 0.36363636 0.6 0.5 0.46153846 0.6 0.77777778 0.5 0.54545455] mean value: 0.5293861693861693 key: train_jcc value: [0.6091954 0.63636364 0.62921348 0.59574468 0.60674157 0.64044944 0.63333333 0.62637363 0.60674157 0.63218391] mean value: 0.6216340654682218 MCC on Blind test: 0.1 Accuracy on Blind test: 0.6 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00746417 0.00665522 0.00723457 0.00721335 0.00727034 0.00734687 0.00666738 0.00743985 0.00722885 0.00735736] mean value: 0.007187795639038086 key: score_time value: [0.009269 0.00891018 0.00945759 0.00945568 0.00962353 0.01024604 0.00979686 0.00947714 0.00947309 0.00945425] mean value: 0.009516334533691407 key: test_mcc value: [0.51639778 0.25819889 0.33928571 0.66143783 0.76376262 0.60714286 0.37796447 0.75592895 0.64465837 0.47245559] mean value: 0.5397233065771696 key: train_mcc value: [0.63242133 0.69486799 0.73721228 0.640228 0.69398264 0.64981886 0.69976319 0.63512361 0.69352089 0.63574336] mean value: 0.6712682142948946 key: test_accuracy value: [0.75 0.625 0.66666667 0.8 0.86666667 0.8 0.66666667 0.86666667 0.8 0.73333333] mean value: 0.7575000000000001 key: train_accuracy value: [0.81617647 0.84558824 0.86861314 0.81751825 0.84671533 0.82481752 0.84671533 0.81751825 0.84671533 0.81751825] mean value: 0.8347896092743666 key: test_fscore value: [0.77777778 0.57142857 0.66666667 0.82352941 0.875 0.8 0.61538462 0.88888889 0.84210526 0.77777778] mean value: 0.7638558972846898 key: train_fscore value: [0.81751825 0.85314685 0.86956522 0.82993197 0.85106383 0.82857143 0.85517241 0.81751825 0.84671533 0.82014388] mean value: 0.8389347425188644 key: test_precision value: [0.7 0.66666667 0.625 0.7 0.77777778 0.75 0.8 0.8 0.72727273 0.7 ] mean value: 0.7246717171717172 key: train_precision value: [0.8115942 0.81333333 0.86956522 0.78205128 0.83333333 0.81690141 0.80519481 0.8115942 0.84057971 0.8028169 ] mean value: 0.8186964397105242 key: test_recall value: [0.875 0.5 0.71428571 1. 1. 0.85714286 0.5 1. 1. 0.875 ] mean value: 0.8321428571428572 key: train_recall value: [0.82352941 0.89705882 0.86956522 0.88405797 0.86956522 0.84057971 0.91176471 0.82352941 0.85294118 0.83823529] mean value: 0.8610826939471441 key: test_roc_auc value: [0.75 0.625 0.66964286 0.8125 0.875 0.80357143 0.67857143 0.85714286 0.78571429 0.72321429] mean value: 0.7580357142857143 key: train_roc_auc value: [0.81617647 0.84558824 0.86860614 0.81702899 0.84654731 0.82470162 0.8471867 0.81756181 0.84676044 0.81766837] mean value: 0.8347826086956521 key: test_jcc value: [0.63636364 0.4 0.5 0.7 0.77777778 0.66666667 0.44444444 0.8 0.72727273 0.63636364] mean value: 0.6288888888888888 key: train_jcc value: [0.69135802 0.74390244 0.76923077 0.70930233 0.74074074 0.70731707 0.74698795 0.69135802 0.73417722 0.69512195] mean value: 0.7229496515347358 MCC on Blind test: 0.05 Accuracy on Blind test: 0.61 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.0097928 0.00793552 0.0076406 0.00769567 0.00771952 0.00770187 0.00762033 0.00775814 0.00766444 0.00767016] mean value: 0.007919907569885254 key: score_time value: [0.00912237 0.00797772 0.00790691 0.00800848 0.00794578 0.00804377 0.00801921 0.00795007 0.00798464 0.00796342] mean value: 0.008092236518859864 key: test_mcc value: [0.75 0.5 0.19642857 0.76376262 0.73214286 0.73214286 0.66143783 1. 0.64465837 0.60714286] mean value: 0.6587715957669568 key: train_mcc value: [0.79446135 0.76470588 0.78182997 0.82480818 0.79590547 0.79590547 0.79560955 0.79560955 0.78298457 0.78107015] mean value: 0.7912890152297882 key: test_accuracy value: [0.875 0.75 0.6 0.86666667 0.86666667 0.86666667 0.8 1. 0.8 0.8 ] mean value: 0.8225 key: train_accuracy value: [0.89705882 0.88235294 0.89051095 0.91240876 0.89781022 0.89781022 0.89781022 0.89781022 0.89051095 0.89051095] mean value: 0.8954594246457708 key: test_fscore value: [0.875 0.75 0.57142857 0.875 0.85714286 0.85714286 0.76923077 1. 0.84210526 0.8 ] mean value: 0.819705031810295 key: train_fscore value: [0.89552239 0.88235294 0.88888889 0.91304348 0.9 0.9 0.89705882 0.89705882 0.88549618 0.88888889] mean value: 0.894831041553975 key: test_precision value: [0.875 0.75 0.57142857 0.77777778 0.85714286 0.85714286 1. 1. 0.72727273 0.85714286] mean value: 0.8272907647907648 key: train_precision value: [0.90909091 0.88235294 0.90909091 0.91304348 0.88732394 0.88732394 0.89705882 0.89705882 0.92063492 0.89552239] mean value: 0.8998501080696548 key: test_recall value: [0.875 0.75 0.57142857 1. 0.85714286 0.85714286 0.625 1. 1. 0.75 ] mean value: 0.8285714285714285 key: train_recall value: [0.88235294 0.88235294 0.86956522 0.91304348 0.91304348 0.91304348 0.89705882 0.89705882 0.85294118 0.88235294] mean value: 0.8902813299232737 key: test_roc_auc value: [0.875 0.75 0.59821429 0.875 0.86607143 0.86607143 0.8125 1. 0.78571429 0.80357143] mean value: 0.8232142857142857 key: train_roc_auc value: [0.89705882 0.88235294 0.89066496 0.91240409 0.89769821 0.89769821 0.89780477 0.89780477 0.8902387 0.89045183] mean value: 0.8954177323103154 key: test_jcc value: [0.77777778 0.6 0.4 0.77777778 0.75 0.75 0.625 1. 0.72727273 0.66666667] mean value: 0.7074494949494949 key: train_jcc value: [0.81081081 0.78947368 0.8 0.84 0.81818182 0.81818182 0.81333333 0.81333333 0.79452055 0.8 ] mean value: 0.8097835345996846 MCC on Blind test: 0.11 Accuracy on Blind test: 0.65 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.61664748 0.47898817 0.45502543 0.50637126 0.53669834 0.53992033 0.47287774 0.47633958 0.47858143 0.62103176] mean value: 0.5182481527328491 key: score_time value: [0.01329303 0.01312971 0.01097465 0.01341534 0.01492548 0.01332402 0.01094341 0.01338291 0.01885128 0.01098609] mean value: 0.013322591781616211 key: test_mcc value: [0.8819171 0.51639778 0.37796447 1. 0.60714286 0.60714286 0.46428571 0.87287156 0.64465837 0.60714286] mean value: 0.6579523574070305 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9375 0.75 0.66666667 1. 0.8 0.8 0.73333333 0.93333333 0.8 0.8 ] mean value: 0.8220833333333334 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93333333 0.71428571 0.70588235 1. 0.8 0.8 0.75 0.94117647 0.84210526 0.8 ] mean value: 0.8286783134306354 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.83333333 0.6 1. 0.75 0.75 0.75 0.88888889 0.72727273 0.85714286] mean value: 0.8156637806637806 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.875 0.625 0.85714286 1. 0.85714286 0.85714286 0.75 1. 1. 0.75 ] mean value: 0.8571428571428571 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9375 0.75 0.67857143 1. 0.80357143 0.80357143 0.73214286 0.92857143 0.78571429 0.80357143] mean value: 0.8223214285714285 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.875 0.55555556 0.54545455 1. 0.66666667 0.66666667 0.6 0.88888889 0.72727273 0.66666667] mean value: 0.7192171717171717 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.06 Accuracy on Blind test: 0.68 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.02302074 0.00763321 0.00718451 0.00732636 0.00716519 0.00725317 0.00720024 0.00724506 0.00735903 0.00749445] mean value: 0.00888819694519043 key: score_time value: [0.01008129 0.00808263 0.00788021 0.00784898 0.00779343 0.00776839 0.00773787 0.00773025 0.00829577 0.00780368] mean value: 0.00810225009918213 key: test_mcc value: [0.8819171 1. 1. 1. 0.6000992 0.73214286 0.87287156 0.75592895 0.73214286 0.56407607] mean value: 0.8139178597903081 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9375 1. 1. 1. 0.8 0.86666667 0.93333333 0.86666667 0.86666667 0.73333333] mean value: 0.9004166666666666 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94117647 1. 1. 1. 0.76923077 0.85714286 0.94117647 0.88888889 0.875 0.66666667] mean value: 0.8939282123105652 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88888889 1. 1. 1. 0.83333333 0.85714286 0.88888889 0.8 0.875 1. ] mean value: 0.9143253968253968 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.71428571 0.85714286 1. 1. 0.875 0.5 ] mean value: 0.8946428571428572 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9375 1. 1. 1. 0.79464286 0.86607143 0.92857143 0.85714286 0.86607143 0.75 ] mean value: 0.9 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88888889 1. 1. 1. 0.625 0.75 0.88888889 0.8 0.77777778 0.5 ] mean value: 0.8230555555555555 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.13 Accuracy on Blind test: 0.86 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.07873535 0.07909012 0.07848859 0.07919955 0.07896852 0.07857132 0.08103371 0.08165836 0.07918024 0.08250403] mean value: 0.07974298000335693 key: score_time value: [0.01622057 0.01643443 0.01677704 0.01642728 0.01640582 0.01631761 0.01749635 0.01630569 0.01675391 0.01715064] mean value: 0.01662893295288086 key: test_mcc value: [0.8819171 0.51639778 0.49099025 1. 0.875 0.73214286 0.76376262 1. 0.75592895 0.875 ] mean value: 0.7891139555200787 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9375 0.75 0.73333333 1. 0.93333333 0.86666667 0.86666667 1. 0.86666667 0.93333333] mean value: 0.88875 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93333333 0.71428571 0.75 1. 0.93333333 0.85714286 0.85714286 1. 0.88888889 0.93333333] mean value: 0.8867460317460317 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.83333333 0.66666667 1. 0.875 0.85714286 1. 1. 0.8 1. ] mean value: 0.9032142857142857 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.875 0.625 0.85714286 1. 1. 0.85714286 0.75 1. 1. 0.875 ] mean value: 0.8839285714285714 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9375 0.75 0.74107143 1. 0.9375 0.86607143 0.875 1. 0.85714286 0.9375 ] mean value: 0.8901785714285715 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.875 0.55555556 0.6 1. 0.875 0.75 0.75 1. 0.8 0.875 ] mean value: 0.8080555555555555 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.06 Accuracy on Blind test: 0.68 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00679111 0.00661206 0.00676751 0.00668883 0.00663257 0.00666237 0.00663257 0.00670314 0.00696945 0.00672388] mean value: 0.006718349456787109 key: score_time value: [0.00769448 0.00768995 0.00776768 0.00772476 0.00775385 0.00774527 0.00774169 0.00774693 0.00784659 0.00774169] mean value: 0.007745289802551269 key: test_mcc value: [0.40451992 0.40451992 0.32732684 1. 0.76376262 0.46428571 0.13363062 0.87287156 0.73214286 0.21821789] mean value: 0.5321277929700597 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.6875 0.6875 0.66666667 1. 0.86666667 0.73333333 0.53333333 0.93333333 0.86666667 0.6 ] mean value: 0.7575 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.73684211 0.61538462 0.61538462 1. 0.875 0.71428571 0.36363636 0.94117647 0.875 0.57142857] mean value: 0.7308138455971274 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.63636364 0.8 0.66666667 1. 0.77777778 0.71428571 0.66666667 0.88888889 0.875 0.66666667] mean value: 0.7692316017316018 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.875 0.5 0.57142857 1. 1. 0.71428571 0.25 1. 0.875 0.5 ] mean value: 0.7285714285714285 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.6875 0.6875 0.66071429 1. 0.875 0.73214286 0.55357143 0.92857143 0.86607143 0.60714286] mean value: 0.7598214285714286 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.58333333 0.44444444 0.44444444 1. 0.77777778 0.55555556 0.22222222 0.88888889 0.77777778 0.4 ] mean value: 0.6094444444444445 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.06 Accuracy on Blind test: 0.68 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [0.98909354 0.98514724 1.04687738 0.98289633 0.98306084 0.98102474 0.9808023 0.98257184 0.98120975 0.97967005] mean value: 0.9892354011535645 key: score_time value: [0.09175563 0.08826041 0.08760238 0.08777761 0.08745551 0.08774495 0.08790946 0.08762598 0.08742118 0.08845329] mean value: 0.08820064067840576 key: test_mcc value: [0.8819171 0.75 0.76376262 1. 0.875 0.73214286 0.60714286 0.87287156 0.87287156 0.76376262] mean value: 0.8119471171513797 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9375 0.875 0.86666667 1. 0.93333333 0.86666667 0.8 0.93333333 0.93333333 0.86666667] mean value: 0.90125 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93333333 0.875 0.875 1. 0.93333333 0.85714286 0.8 0.94117647 0.94117647 0.85714286] mean value: 0.9013305322128852 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.875 0.77777778 1. 0.875 0.85714286 0.85714286 0.88888889 0.88888889 1. ] mean value: 0.901984126984127 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.875 0.875 1. 1. 1. 0.85714286 0.75 1. 1. 0.75 ] mean value: 0.9107142857142857 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9375 0.875 0.875 1. 0.9375 0.86607143 0.80357143 0.92857143 0.92857143 0.875 ] mean value: 0.9026785714285714 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.875 0.77777778 0.77777778 1. 0.875 0.75 0.66666667 0.88888889 0.88888889 0.75 ] mean value: 0.825 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.12 Accuracy on Blind test: 0.84 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.8277626 0.8267982 0.83943486 0.95936847 0.89719224 0.9292078 0.87691498 0.90619445 0.85252666 0.84288502] mean value: 0.8758285284042359 key: score_time value: [0.23116565 0.20367575 0.20599627 0.15598726 0.19488597 0.1595974 0.24725604 0.22806668 0.2303443 0.21640897] mean value: 0.20733842849731446 key: test_mcc value: [0.8819171 0.75 0.76376262 1. 0.875 0.73214286 0.60714286 0.87287156 0.87287156 0.66143783] mean value: 0.8017146383453971 key: train_mcc value: [0.98540068 0.98540068 0.95630861 0.98550418 0.98550418 0.98550418 0.98550418 0.97080136 0.97080136 0.98550418] mean value: 0.9796233587390223 key: test_accuracy value: [0.9375 0.875 0.86666667 1. 0.93333333 0.86666667 0.8 0.93333333 0.93333333 0.8 ] mean value: 0.8945833333333334 key: train_accuracy value: [0.99264706 0.99264706 0.97810219 0.99270073 0.99270073 0.99270073 0.99270073 0.98540146 0.98540146 0.99270073] mean value: 0.9897702876771146 key: test_fscore value: [0.93333333 0.875 0.875 1. 0.93333333 0.85714286 0.8 0.94117647 0.94117647 0.76923077] mean value: 0.8925393234216764 key: train_fscore value: [0.99259259 0.99259259 0.97810219 0.99280576 0.99280576 0.99280576 0.99259259 0.98529412 0.98529412 0.99259259] mean value: 0.9897478061632561 key: test_precision value: [1. 0.875 0.77777778 1. 0.875 0.85714286 0.85714286 0.88888889 0.88888889 1. ] mean value: 0.901984126984127 key: train_precision value: [1. 1. 0.98529412 0.98571429 0.98571429 0.98571429 1. 0.98529412 0.98529412 1. ] mean value: 0.9913025210084034 key: test_recall value: [0.875 0.875 1. 1. 1. 0.85714286 0.75 1. 1. 0.625 ] mean value: 0.8982142857142857 key: train_recall value: [0.98529412 0.98529412 0.97101449 1. 1. 1. 0.98529412 0.98529412 0.98529412 0.98529412] mean value: 0.9882779198635976 key: test_roc_auc value: [0.9375 0.875 0.875 1. 0.9375 0.86607143 0.80357143 0.92857143 0.92857143 0.8125 ] mean value: 0.8964285714285715 key: train_roc_auc value: [0.99264706 0.99264706 0.97815431 0.99264706 0.99264706 0.99264706 0.99264706 0.98540068 0.98540068 0.99264706] mean value: 0.9897485080988918 key: test_jcc value: [0.875 0.77777778 0.77777778 1. 0.875 0.75 0.66666667 0.88888889 0.88888889 0.625 ] mean value: 0.8125 key: train_jcc value: [0.98529412 0.98529412 0.95714286 0.98571429 0.98571429 0.98571429 0.98529412 0.97101449 0.97101449 0.98529412] mean value: 0.9797491170381196 MCC on Blind test: 0.11 Accuracy on Blind test: 0.83 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01690888 0.00677323 0.00677204 0.00677943 0.0067265 0.00685477 0.00684571 0.00681353 0.00681353 0.00683665] mean value: 0.00781242847442627 key: score_time value: [0.01041579 0.00778389 0.00794005 0.00778031 0.00778699 0.00781918 0.00782156 0.00780797 0.00778556 0.0078373 ] mean value: 0.008077859878540039 key: test_mcc value: [0.37796447 0.25819889 0.07142857 0.49099025 0.47245559 0.13363062 0.46428571 0.73214286 0.33928571 0.32732684] mean value: 0.36677095205019633 key: train_mcc value: [0.5008673 0.53311399 0.52059257 0.45151662 0.49006025 0.5360985 0.52559229 0.51215762 0.49197671 0.53517487] mean value: 0.5097150730382196 key: test_accuracy value: [0.6875 0.625 0.53333333 0.73333333 0.73333333 0.53333333 0.73333333 0.86666667 0.66666667 0.66666667] mean value: 0.6779166666666666 key: train_accuracy value: [0.75 0.76470588 0.75912409 0.72262774 0.74452555 0.76642336 0.75912409 0.75182482 0.74452555 0.76642336] mean value: 0.7529304422498926 key: test_fscore value: [0.70588235 0.57142857 0.53333333 0.75 0.66666667 0.63157895 0.75 0.875 0.66666667 0.70588235] mean value: 0.6856438891346012 key: train_fscore value: [0.75714286 0.77777778 0.77241379 0.74666667 0.75524476 0.78082192 0.7755102 0.77027027 0.75524476 0.77464789] mean value: 0.7665740884664326 key: test_precision value: [0.66666667 0.66666667 0.5 0.66666667 0.8 0.5 0.75 0.875 0.71428571 0.66666667] mean value: 0.680595238095238 key: train_precision value: [0.73611111 0.73684211 0.73684211 0.69135802 0.72972973 0.74025974 0.72151899 0.7125 0.72 0.74324324] mean value: 0.726840504690327 key: test_recall value: [0.75 0.5 0.57142857 0.85714286 0.57142857 0.85714286 0.75 0.875 0.625 0.75 ] mean value: 0.7107142857142857 key: train_recall value: [0.77941176 0.82352941 0.8115942 0.8115942 0.7826087 0.82608696 0.83823529 0.83823529 0.79411765 0.80882353] mean value: 0.8114236999147485 key: test_roc_auc value: [0.6875 0.625 0.53571429 0.74107143 0.72321429 0.55357143 0.73214286 0.86607143 0.66964286 0.66071429] mean value: 0.6794642857142857 key: train_roc_auc value: [0.75 0.76470588 0.75873828 0.72197357 0.74424552 0.76598465 0.75969736 0.75245098 0.74488491 0.76673061] mean value: 0.7529411764705882 key: test_jcc value: [0.54545455 0.4 0.36363636 0.6 0.5 0.46153846 0.6 0.77777778 0.5 0.54545455] mean value: 0.5293861693861693 key: train_jcc value: [0.6091954 0.63636364 0.62921348 0.59574468 0.60674157 0.64044944 0.63333333 0.62637363 0.60674157 0.63218391] mean value: 0.6216340654682218 MCC on Blind test: 0.1 Accuracy on Blind test: 0.6 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.09977555 0.03077435 0.03092337 0.03185725 0.03266478 0.20152545 0.03012586 0.03033113 0.03158212 0.03259635] mean value: 0.05521562099456787 key: score_time value: [0.01020741 0.00965858 0.00987267 0.0099175 0.01043272 0.01017642 0.00950527 0.0099225 0.00961161 0.00984406] mean value: 0.009914875030517578 key: test_mcc value: [1. 0.75 1. 1. 0.73214286 1. 0.87287156 1. 0.87287156 0.76376262] mean value: 0.8991648594856769 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.875 1. 1. 0.86666667 1. 0.93333333 1. 0.93333333 0.86666667] mean value: 0.9475 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.875 1. 1. 0.85714286 1. 0.94117647 1. 0.94117647 0.85714286] mean value: 0.9471638655462185 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.875 1. 1. 0.85714286 1. 0.88888889 1. 0.88888889 1. ] mean value: 0.9509920634920634 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.875 1. 1. 0.85714286 1. 1. 1. 1. 0.75 ] mean value: 0.9482142857142857 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.875 1. 1. 0.86607143 1. 0.92857143 1. 0.92857143 0.875 ] mean value: 0.9473214285714285 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.77777778 1. 1. 0.75 1. 0.88888889 1. 0.88888889 0.75 ] mean value: 0.9055555555555556 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.12 Accuracy on Blind test: 0.84 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.00941396 0.01151013 0.01147294 0.01190257 0.0120573 0.01343966 0.01201916 0.01195407 0.01195812 0.01198363] mean value: 0.011771154403686524 key: score_time value: [0.01016879 0.00986719 0.01031709 0.01051497 0.01036811 0.0106349 0.01084495 0.01056862 0.01060319 0.01060867] mean value: 0.010449647903442383 key: test_mcc value: [1. 0.62994079 0.49099025 1. 0.875 0.73214286 0.87287156 1. 0.75592895 0.75592895] mean value: 0.811280335150343 key: train_mcc value: [0.91215932 0.95681396 0.92944673 0.88466669 0.89863497 0.94199209 0.90025835 0.9139999 0.91281179 0.87099729] mean value: 0.9121781087453906 key: test_accuracy value: [1. 0.8125 0.73333333 1. 0.93333333 0.86666667 0.93333333 1. 0.86666667 0.86666667] mean value: 0.90125 key: train_accuracy value: [0.95588235 0.97794118 0.96350365 0.94160584 0.94890511 0.97080292 0.94890511 0.95620438 0.95620438 0.93430657] mean value: 0.9554261485616145 key: test_fscore value: [1. 0.82352941 0.75 1. 0.93333333 0.85714286 0.94117647 1. 0.88888889 0.88888889] mean value: 0.908295985060691 key: train_fscore value: [0.95652174 0.97841727 0.96503497 0.94366197 0.95035461 0.97142857 0.95035461 0.95714286 0.95652174 0.93617021] mean value: 0.9565608542509413 key: test_precision value: [1. 0.77777778 0.66666667 1. 0.875 0.85714286 0.88888889 1. 0.8 0.8 ] mean value: 0.866547619047619 key: train_precision value: [0.94285714 0.95774648 0.93243243 0.91780822 0.93055556 0.95774648 0.91780822 0.93055556 0.94285714 0.90410959] mean value: 0.9334476814401569 key: test_recall value: [1. 0.875 0.85714286 1. 1. 0.85714286 1. 1. 1. 1. ] mean value: 0.9589285714285715 key: train_recall value: [0.97058824 1. 1. 0.97101449 0.97101449 0.98550725 0.98529412 0.98529412 0.97058824 0.97058824] mean value: 0.9809889173060529 key: test_roc_auc value: [1. 0.8125 0.74107143 1. 0.9375 0.86607143 0.92857143 1. 0.85714286 0.85714286] mean value: 0.9 key: train_roc_auc value: [0.95588235 0.97794118 0.96323529 0.9413896 0.94874254 0.9706948 0.9491688 0.95641517 0.95630861 0.93456948] mean value: 0.9554347826086956 key: test_jcc value: [1. 0.7 0.6 1. 0.875 0.75 0.88888889 1. 0.8 0.8 ] mean value: 0.8413888888888889 key: train_jcc value: [0.91666667 0.95774648 0.93243243 0.89333333 0.90540541 0.94444444 0.90540541 0.91780822 0.91666667 0.88 ] mean value: 0.9169909052405676 MCC on Blind test: 0.06 Accuracy on Blind test: 0.65 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.02651024 0.0071528 0.00675678 0.00662589 0.00688267 0.00663829 0.00680137 0.00680256 0.00683355 0.00683928] mean value: 0.008784341812133788 key: score_time value: [0.01571369 0.00825262 0.00792074 0.0078783 0.00784111 0.00788522 0.00770473 0.00788617 0.00793123 0.00775051] mean value: 0.008676433563232422 key: test_mcc value: [0.62994079 0.37796447 0.21821789 0.60714286 0.73214286 0.26189246 0.66143783 0.87287156 0.46428571 0.46428571] mean value: 0.529018214646944 key: train_mcc value: [0.55979287 0.57408838 0.62076318 0.57703846 0.54864511 0.60584099 0.57730871 0.51887407 0.56235346 0.56235346] mean value: 0.5707058671664582 key: test_accuracy value: [0.8125 0.6875 0.6 0.8 0.86666667 0.6 0.8 0.93333333 0.73333333 0.73333333] mean value: 0.7566666666666667 key: train_accuracy value: [0.77941176 0.78676471 0.81021898 0.78832117 0.77372263 0.80291971 0.78832117 0.75912409 0.7810219 0.7810219 ] mean value: 0.785084800343495 key: test_fscore value: [0.8 0.66666667 0.625 0.8 0.85714286 0.66666667 0.76923077 0.94117647 0.75 0.75 ] mean value: 0.7625883430295195 key: train_fscore value: [0.78571429 0.79136691 0.80882353 0.79432624 0.78321678 0.8057554 0.79136691 0.76258993 0.7826087 0.7826087 ] mean value: 0.7888377367472581 key: test_precision value: [0.85714286 0.71428571 0.55555556 0.75 0.85714286 0.54545455 1. 0.88888889 0.75 0.75 ] mean value: 0.7668470418470419 key: train_precision value: [0.76388889 0.77464789 0.82089552 0.77777778 0.75675676 0.8 0.77464789 0.74647887 0.77142857 0.77142857] mean value: 0.775795073655595 key: test_recall value: [0.75 0.625 0.71428571 0.85714286 0.85714286 0.85714286 0.625 1. 0.75 0.75 ] mean value: 0.7785714285714286 key: train_recall value: [0.80882353 0.80882353 0.79710145 0.8115942 0.8115942 0.8115942 0.80882353 0.77941176 0.79411765 0.79411765] mean value: 0.8026001705029838 key: test_roc_auc value: [0.8125 0.6875 0.60714286 0.80357143 0.86607143 0.61607143 0.8125 0.92857143 0.73214286 0.73214286] mean value: 0.7598214285714285 key: train_roc_auc value: [0.77941176 0.78676471 0.81031543 0.78815004 0.77344416 0.80285592 0.78846974 0.7592711 0.78111679 0.78111679] mean value: 0.7850916453537937 key: test_jcc value: [0.66666667 0.5 0.45454545 0.66666667 0.75 0.5 0.625 0.88888889 0.6 0.6 ] mean value: 0.6251767676767677 key: train_jcc value: [0.64705882 0.6547619 0.67901235 0.65882353 0.64367816 0.6746988 0.6547619 0.61627907 0.64285714 0.64285714] mean value: 0.651478881972599 MCC on Blind test: 0.11 Accuracy on Blind test: 0.62 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00804782 0.00781918 0.00786948 0.00782323 0.00760174 0.00795841 0.00802374 0.00730324 0.00732517 0.00728822] mean value: 0.0077060222625732425 key: score_time value: [0.00777936 0.00796533 0.00840831 0.00785279 0.00842953 0.00844431 0.00777602 0.00784135 0.00779104 0.00782919] mean value: 0.008011722564697265 key: test_mcc value: [0.8819171 0.62994079 0.49099025 1. 0.73214286 0.60714286 0.6000992 1. 0.64465837 0.6000992 ] mean value: 0.7186990626871869 key: train_mcc value: [0.89949371 0.91215932 0.92791659 0.88466669 0.94199209 0.94160273 0.88938138 0.8687127 0.84688958 0.86000692] mean value: 0.8972821710057162 key: test_accuracy value: [0.9375 0.8125 0.73333333 1. 0.86666667 0.8 0.8 1. 0.8 0.8 ] mean value: 0.855 key: train_accuracy value: [0.94852941 0.95588235 0.96350365 0.94160584 0.97080292 0.97080292 0.94160584 0.93430657 0.91970803 0.9270073 ] mean value: 0.9473754830399312 key: test_fscore value: [0.93333333 0.82352941 0.75 1. 0.85714286 0.8 0.82352941 1. 0.84210526 0.82352941] mean value: 0.8653169688928203 key: train_fscore value: [0.94656489 0.95652174 0.96296296 0.94366197 0.97142857 0.97101449 0.94444444 0.93430657 0.92413793 0.93055556] mean value: 0.9485599123980311 key: test_precision value: [1. 0.77777778 0.66666667 1. 0.85714286 0.75 0.77777778 1. 0.72727273 0.77777778] mean value: 0.8334415584415584 key: train_precision value: [0.98412698 0.94285714 0.98484848 0.91780822 0.95774648 0.97101449 0.89473684 0.92753623 0.87012987 0.88157895] mean value: 0.9332383694125169 key: test_recall value: [0.875 0.875 0.85714286 1. 0.85714286 0.85714286 0.875 1. 1. 0.875 ] mean value: 0.9071428571428571 key: train_recall value: [0.91176471 0.97058824 0.94202899 0.97101449 0.98550725 0.97101449 1. 0.94117647 0.98529412 0.98529412] mean value: 0.9663682864450128 key: test_roc_auc value: [0.9375 0.8125 0.74107143 1. 0.86607143 0.80357143 0.79464286 1. 0.78571429 0.79464286] mean value: 0.8535714285714285 key: train_roc_auc value: [0.94852941 0.95588235 0.96366155 0.9413896 0.9706948 0.97080136 0.94202899 0.93435635 0.92018329 0.92742967] mean value: 0.9474957374254049 key: test_jcc value: [0.875 0.7 0.6 1. 0.75 0.66666667 0.7 1. 0.72727273 0.7 ] mean value: 0.7718939393939394 key: train_jcc value: [0.89855072 0.91666667 0.92857143 0.89333333 0.94444444 0.94366197 0.89473684 0.87671233 0.85897436 0.87012987] mean value: 0.9025781969461155 MCC on Blind test: 0.07 Accuracy on Blind test: 0.69 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00999784 0.0094552 0.00723362 0.00716352 0.00696826 0.00690985 0.00689554 0.00771546 0.0079 0.00781608] mean value: 0.007805538177490234 key: score_time value: [0.01038742 0.00955176 0.00792933 0.00781178 0.00785041 0.00781822 0.00781894 0.00777292 0.00842071 0.00786495] mean value: 0.008322644233703613 key: test_mcc value: [0.8819171 0.62994079 0.49099025 0.875 0.76376262 0.60714286 0.46428571 0.53452248 0.46428571 0.47245559] mean value: 0.6184303121694533 key: train_mcc value: [0.88580789 0.81600218 0.92791659 0.9001543 0.80787444 0.80014442 0.8437116 0.64876322 0.87609014 0.86339318] mean value: 0.836985797579123 key: test_accuracy value: [0.9375 0.8125 0.73333333 0.93333333 0.86666667 0.8 0.73333333 0.73333333 0.73333333 0.73333333] mean value: 0.8016666666666666 key: train_accuracy value: [0.94117647 0.90441176 0.96350365 0.94890511 0.89781022 0.89051095 0.91970803 0.79562044 0.93430657 0.9270073 ] mean value: 0.912296049806784 key: test_fscore value: [0.93333333 0.82352941 0.75 0.93333333 0.875 0.8 0.75 0.8 0.75 0.77777778] mean value: 0.819297385620915 key: train_fscore value: [0.93846154 0.91034483 0.96296296 0.95104895 0.90666667 0.90196078 0.91472868 0.82926829 0.92913386 0.93150685] mean value: 0.9176083413476306 key: test_precision value: [1. 0.77777778 0.66666667 0.875 0.77777778 0.75 0.75 0.66666667 0.75 0.7 ] mean value: 0.7713888888888889 key: train_precision value: [0.98387097 0.85714286 0.98484848 0.91891892 0.83950617 0.82142857 0.96721311 0.70833333 1. 0.87179487] mean value: 0.8953057292802578 key: test_recall value: [0.875 0.875 0.85714286 1. 1. 0.85714286 0.75 1. 0.75 0.875 ] mean value: 0.8839285714285714 key: train_recall value: [0.89705882 0.97058824 0.94202899 0.98550725 0.98550725 1. 0.86764706 1. 0.86764706 1. ] mean value: 0.9515984654731457 key: test_roc_auc value: [0.9375 0.8125 0.74107143 0.9375 0.875 0.80357143 0.73214286 0.71428571 0.73214286 0.72321429] mean value: 0.8008928571428571 key: train_roc_auc value: [0.94117647 0.90441176 0.96366155 0.94863598 0.89716539 0.88970588 0.91933078 0.79710145 0.93382353 0.92753623] mean value: 0.9122549019607843 key: test_jcc value: [0.875 0.7 0.6 0.875 0.77777778 0.66666667 0.6 0.66666667 0.6 0.63636364] mean value: 0.6997474747474748 key: train_jcc value: [0.88405797 0.83544304 0.92857143 0.90666667 0.82926829 0.82142857 0.84285714 0.70833333 0.86764706 0.87179487] mean value: 0.8496068375147647 MCC on Blind test: 0.06 Accuracy on Blind test: 0.66 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.07770419 0.06228852 0.0625062 0.06266785 0.06289601 0.06246185 0.06297612 0.06292748 0.06235862 0.06280899] mean value: 0.06415958404541015 key: score_time value: [0.01418233 0.01393175 0.01422071 0.01399136 0.01391673 0.01394653 0.01503801 0.01420355 0.01413107 0.01432395] mean value: 0.014188599586486817 key: test_mcc value: [0.8819171 0.75 0.875 0.875 0.73214286 0.87287156 0.87287156 1. 0.75592895 0.76376262] mean value: 0.8379494644563421 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9375 0.875 0.93333333 0.93333333 0.86666667 0.93333333 0.93333333 1. 0.86666667 0.86666667] mean value: 0.9145833333333333 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.93333333 0.875 0.93333333 0.93333333 0.85714286 0.92307692 0.94117647 1. 0.88888889 0.85714286] mean value: 0.9142427996839761 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.875 0.875 0.875 0.85714286 1. 0.88888889 1. 0.8 1. ] mean value: 0.9171031746031746 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.875 0.875 1. 1. 0.85714286 0.85714286 1. 1. 1. 0.75 ] mean value: 0.9214285714285714 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9375 0.875 0.9375 0.9375 0.86607143 0.92857143 0.92857143 1. 0.85714286 0.875 ] mean value: 0.9142857142857143 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.875 0.77777778 0.875 0.875 0.75 0.85714286 0.88888889 1. 0.8 0.75 ] mean value: 0.8448809523809524 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.75 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.02706838 0.02781153 0.04628038 0.03850269 0.0461607 0.04655218 0.04729891 0.04126883 0.03603816 0.0402298 ] mean value: 0.03972115516662598 key: score_time value: [0.02073336 0.02294326 0.03598142 0.040658 0.03594398 0.03722 0.03625917 0.02713251 0.02583647 0.03715944] mean value: 0.03198676109313965 key: test_mcc value: [0.8819171 0.8819171 1. 1. 0.73214286 0.73214286 0.87287156 0.87287156 0.73214286 1. ] mean value: 0.8706005900692904 key: train_mcc value: [0.98540068 1. 1. 1. 1. 1. 1. 0.98550725 1. 1. ] mean value: 0.9970907922626642 key: test_accuracy value: [0.9375 0.9375 1. 1. 0.86666667 0.86666667 0.93333333 0.93333333 0.86666667 1. ] mean value: 0.9341666666666667 key: train_accuracy value: [0.99264706 1. 1. 1. 1. 1. 1. 0.99270073 1. 1. ] mean value: 0.9985347788750537 key: test_fscore value: [0.94117647 0.94117647 1. 1. 0.85714286 0.85714286 0.94117647 0.94117647 0.875 1. ] mean value: 0.9353991596638656 key: train_fscore value: [0.99259259 1. 1. 1. 1. 1. 1. 0.99270073 1. 1. ] mean value: 0.99852933225196 key: test_precision value: [0.88888889 0.88888889 1. 1. 0.85714286 0.85714286 0.88888889 0.88888889 0.875 1. ] mean value: 0.914484126984127 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 0.98550725 1. 1. ] mean value: 0.9985507246376811 key: test_recall value: [1. 1. 1. 1. 0.85714286 0.85714286 1. 1. 0.875 1. ] mean value: 0.9589285714285715 key: train_recall value: [0.98529412 1. 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9985294117647059 key: test_roc_auc value: [0.9375 0.9375 1. 1. 0.86607143 0.86607143 0.92857143 0.92857143 0.86607143 1. ] mean value: 0.9330357142857143 key: train_roc_auc value: [0.99264706 1. 1. 1. 1. 1. 1. 0.99275362 1. 1. ] mean value: 0.9985400682011936 key: test_jcc value: [0.88888889 0.88888889 1. 1. 0.75 0.75 0.88888889 0.88888889 0.77777778 1. ] mean value: 0.8833333333333333 key: train_jcc value: [0.98529412 1. 1. 1. 1. 1. 1. 0.98550725 1. 1. ] mean value: 0.997080136402387 MCC on Blind test: 0.12 Accuracy on Blind test: 0.85 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.03373861 0.03912592 0.04229856 0.04023004 0.04611397 0.04011154 0.03928065 0.04038763 0.04050183 0.04010868] mean value: 0.04018974304199219 key: score_time value: [0.0198133 0.01117086 0.01123762 0.02080536 0.02091765 0.01118398 0.02124166 0.02203465 0.01984 0.02217436] mean value: 0.01804194450378418 key: test_mcc value: [0.77459667 0.37796447 0.33928571 0.56407607 0.76376262 0.73214286 0.37796447 0.87287156 0.64465837 0.46428571] mean value: 0.5911608523782237 key: train_mcc value: [0.94117647 0.95598573 0.98550418 0.95630861 0.94160273 0.97080136 0.97080136 0.97080136 0.97080136 0.94201665] mean value: 0.9605799824099576 key: test_accuracy value: [0.875 0.6875 0.66666667 0.73333333 0.86666667 0.86666667 0.66666667 0.93333333 0.8 0.73333333] mean value: 0.7829166666666667 key: train_accuracy value: [0.97058824 0.97794118 0.99270073 0.97810219 0.97080292 0.98540146 0.98540146 0.98540146 0.98540146 0.97080292] mean value: 0.9802544010304852 key: test_fscore value: [0.88888889 0.66666667 0.66666667 0.77777778 0.875 0.85714286 0.61538462 0.94117647 0.84210526 0.75 ] mean value: 0.7880809206273602 key: train_fscore value: [0.97058824 0.97810219 0.99280576 0.97810219 0.97101449 0.98550725 0.98529412 0.98529412 0.98529412 0.97101449] mean value: 0.980301695507708 key: test_precision value: [0.8 0.71428571 0.625 0.63636364 0.77777778 0.85714286 0.8 0.88888889 0.72727273 0.75 ] mean value: 0.7576731601731602 key: train_precision value: [0.97058824 0.97101449 0.98571429 0.98529412 0.97101449 0.98550725 0.98529412 0.98529412 0.98529412 0.95714286] mean value: 0.9782158080623554 key: test_recall value: [1. 0.625 0.71428571 1. 1. 0.85714286 0.5 1. 1. 0.75 ] mean value: 0.8446428571428571 key: train_recall value: [0.97058824 0.98529412 1. 0.97101449 0.97101449 0.98550725 0.98529412 0.98529412 0.98529412 0.98529412] mean value: 0.982459505541347 key: test_roc_auc value: [0.875 0.6875 0.66964286 0.75 0.875 0.86607143 0.67857143 0.92857143 0.78571429 0.73214286] mean value: 0.7848214285714286 key: train_roc_auc value: [0.97058824 0.97794118 0.99264706 0.97815431 0.97080136 0.98540068 0.98540068 0.98540068 0.98540068 0.97090793] mean value: 0.9802642796248935 key: test_jcc value: [0.8 0.5 0.5 0.63636364 0.77777778 0.75 0.44444444 0.88888889 0.72727273 0.6 ] mean value: 0.6624747474747474 key: train_jcc value: [0.94285714 0.95714286 0.98571429 0.95714286 0.94366197 0.97142857 0.97101449 0.97101449 0.97101449 0.94366197] mean value: 0.9614653136208555 MCC on Blind test: 0.05 Accuracy on Blind test: 0.62 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.09781337 0.10118818 0.09096408 0.09063625 0.08830929 0.08863807 0.1010282 0.0922606 0.0915482 0.09096527] mean value: 0.09333515167236328 key: score_time value: [0.00950933 0.00844288 0.00881338 0.00852418 0.00897932 0.00888801 0.00875974 0.00871754 0.00904465 0.00866079] mean value: 0.008833980560302735 key: test_mcc value: [0.8819171 0.8819171 1. 1. 0.73214286 0.73214286 0.87287156 0.87287156 0.73214286 1. ] mean value: 0.8706005900692904 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9375 0.9375 1. 1. 0.86666667 0.86666667 0.93333333 0.93333333 0.86666667 1. ] mean value: 0.9341666666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94117647 0.94117647 1. 1. 0.85714286 0.85714286 0.94117647 0.94117647 0.875 1. ] mean value: 0.9353991596638656 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88888889 0.88888889 1. 1. 0.85714286 0.85714286 0.88888889 0.88888889 0.875 1. ] mean value: 0.914484126984127 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.85714286 0.85714286 1. 1. 0.875 1. ] mean value: 0.9589285714285715 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9375 0.9375 1. 1. 0.86607143 0.86607143 0.92857143 0.92857143 0.86607143 1. ] mean value: 0.9330357142857143 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88888889 0.88888889 1. 1. 0.75 0.75 0.88888889 0.88888889 0.77777778 1. ] mean value: 0.8833333333333333 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.82 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.00983596 0.01095533 0.01153588 0.01127434 0.01293206 0.01323128 0.01175475 0.01181364 0.01138139 0.01201797] mean value: 0.011673259735107421 key: score_time value: [0.01050639 0.01042032 0.01051211 0.01094747 0.01169777 0.01332498 0.01089931 0.01096082 0.01095772 0.01388907] mean value: 0.011411595344543456 key: test_mcc value: [0.75 0.62994079 0.64465837 0.64465837 0.6000992 0.34247476 0.46770717 0.49099025 0.33928571 0.66143783] mean value: 0.5571252457078674 key: train_mcc value: [0.84051051 0.92737353 0.90259957 0.80073303 0.88938138 0.71739374 0.94318882 0.82498207 0.90246052 0.92944673] mean value: 0.8678069912939567 key: test_accuracy value: [0.875 0.8125 0.8 0.8 0.8 0.66666667 0.66666667 0.73333333 0.66666667 0.8 ] mean value: 0.7620833333333333 key: train_accuracy value: [0.91911765 0.96323529 0.94890511 0.89051095 0.94160584 0.83941606 0.97080292 0.90510949 0.94890511 0.96350365] mean value: 0.9291112065264062 key: test_fscore value: [0.875 0.82352941 0.72727273 0.72727273 0.76923077 0.54545455 0.54545455 0.71428571 0.66666667 0.76923077] mean value: 0.7163397876633171 key: train_fscore value: [0.91603053 0.96240602 0.94656489 0.87804878 0.93846154 0.81034483 0.96969697 0.89430894 0.94573643 0.96183206] mean value: 0.9223430989384103 key: test_precision value: [0.875 0.77777778 1. 1. 0.83333333 0.75 1. 0.83333333 0.71428571 1. ] mean value: 0.8783730158730159 key: train_precision value: [0.95238095 0.98461538 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9936996336996337 key: test_recall value: [0.875 0.875 0.57142857 0.57142857 0.71428571 0.42857143 0.375 0.625 0.625 0.625 ] mean value: 0.6285714285714286 key: train_recall value: [0.88235294 0.94117647 0.89855072 0.7826087 0.88405797 0.68115942 0.94117647 0.80882353 0.89705882 0.92647059] mean value: 0.8643435635123615 key: test_roc_auc value: [0.875 0.8125 0.78571429 0.78571429 0.79464286 0.65178571 0.6875 0.74107143 0.66964286 0.8125 ] mean value: 0.7616071428571428 key: train_roc_auc value: [0.91911765 0.96323529 0.94927536 0.89130435 0.94202899 0.84057971 0.97058824 0.90441176 0.94852941 0.96323529] mean value: 0.9292306052855925 key: test_jcc value: [0.77777778 0.7 0.57142857 0.57142857 0.625 0.375 0.375 0.55555556 0.5 0.625 ] mean value: 0.5676190476190476 key: train_jcc value: [0.84507042 0.92753623 0.89855072 0.7826087 0.88405797 0.68115942 0.94117647 0.80882353 0.89705882 0.92647059] mean value: 0.8592512877778178 MCC on Blind test: 0.13 Accuracy on Blind test: 0.85 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.01431322 0.01028609 0.0085125 0.00834036 0.00857472 0.00834465 0.00830102 0.00752926 0.0077374 0.00805783] mean value: 0.00899970531463623 key: score_time value: [0.01112556 0.00929952 0.00890088 0.00855279 0.0085485 0.00859904 0.00831628 0.00797892 0.00823665 0.00807309] mean value: 0.00876312255859375 key: test_mcc value: [0.8819171 0.62994079 0.66143783 1. 0.875 0.73214286 0.6000992 1. 0.75592895 0.6000992 ] mean value: 0.7736565919262326 key: train_mcc value: [0.86849267 0.89715584 0.89791134 0.88355744 0.88355744 0.89863497 0.85440207 0.85440207 0.89791134 0.86948194] mean value: 0.8805507116446566 key: test_accuracy value: [0.9375 0.8125 0.8 1. 0.93333333 0.86666667 0.8 1. 0.86666667 0.8 ] mean value: 0.8816666666666667 key: train_accuracy value: [0.93382353 0.94852941 0.94890511 0.94160584 0.94160584 0.94890511 0.9270073 0.9270073 0.94890511 0.93430657] mean value: 0.9400601116358952 key: test_fscore value: [0.93333333 0.82352941 0.82352941 1. 0.93333333 0.85714286 0.82352941 1. 0.88888889 0.82352941] mean value: 0.8906816059757237 key: train_fscore value: [0.9352518 0.94890511 0.94890511 0.94285714 0.94285714 0.95035461 0.92753623 0.92753623 0.94890511 0.9352518 ] mean value: 0.9408360285000935 key: test_precision value: [1. 0.77777778 0.7 1. 0.875 0.85714286 0.77777778 1. 0.8 0.77777778] mean value: 0.856547619047619 key: train_precision value: [0.91549296 0.94202899 0.95588235 0.92957746 0.92957746 0.93055556 0.91428571 0.91428571 0.94202899 0.91549296] mean value: 0.9289208153153076 key: test_recall value: [0.875 0.875 1. 1. 1. 0.85714286 0.875 1. 1. 0.875 ] mean value: 0.9357142857142857 key: train_recall value: [0.95588235 0.95588235 0.94202899 0.95652174 0.95652174 0.97101449 0.94117647 0.94117647 0.95588235 0.95588235] mean value: 0.9531969309462915 key: test_roc_auc value: [0.9375 0.8125 0.8125 1. 0.9375 0.86607143 0.79464286 1. 0.85714286 0.79464286] mean value: 0.88125 key: train_roc_auc value: [0.93382353 0.94852941 0.94895567 0.94149616 0.94149616 0.94874254 0.92710997 0.92710997 0.94895567 0.93446292] mean value: 0.940068201193521 key: test_jcc value: [0.875 0.7 0.7 1. 0.875 0.75 0.7 1. 0.8 0.7 ] mean value: 0.8099999999999999 key: train_jcc value: [0.87837838 0.90277778 0.90277778 0.89189189 0.89189189 0.90540541 0.86486486 0.86486486 0.90277778 0.87837838] mean value: 0.888400900900901 MCC on Blind test: 0.06 Accuracy on Blind test: 0.67 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:143: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:146: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy smnc_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.07313299 0.06227994 0.06231952 0.06033921 0.06083584 0.06107545 0.06096387 0.06110859 0.06186771 0.06140947] mean value: 0.06253325939178467 key: score_time value: [0.00833368 0.00824118 0.00828338 0.00820613 0.00824714 0.00827527 0.00827289 0.00825977 0.00888276 0.00831437] mean value: 0.008331656455993652 key: test_mcc value: [0.8819171 0.62994079 0.66143783 1. 0.875 0.73214286 0.75592895 1. 0.75592895 0.6000992 ] mean value: 0.7892395667131802 key: train_mcc value: [0.86849267 0.89715584 0.8978896 0.89863497 0.88355744 0.92709446 0.89869927 0.85440207 0.92710997 0.87099729] mean value: 0.8924033569902855 key: test_accuracy value: [0.9375 0.8125 0.8 1. 0.93333333 0.86666667 0.86666667 1. 0.86666667 0.8 ] mean value: 0.8883333333333333 key: train_accuracy value: [0.93382353 0.94852941 0.94890511 0.94890511 0.94160584 0.96350365 0.94890511 0.9270073 0.96350365 0.93430657] mean value: 0.9458995276942894 key: test_fscore value: [0.93333333 0.82352941 0.82352941 1. 0.93333333 0.85714286 0.88888889 1. 0.88888889 0.82352941] mean value: 0.8972175536881419 key: train_fscore value: [0.9352518 0.94890511 0.94964029 0.95035461 0.94285714 0.96402878 0.94964029 0.92753623 0.96350365 0.93617021] mean value: 0.946788810763946 key: test_precision value: [1. 0.77777778 0.7 1. 0.875 0.85714286 0.8 1. 0.8 0.77777778] mean value: 0.8587698412698412 key: train_precision value: [0.91549296 0.94202899 0.94285714 0.93055556 0.92957746 0.95714286 0.92957746 0.91428571 0.95652174 0.90410959] mean value: 0.932214947084399 key: test_recall value: [0.875 0.875 1. 1. 1. 0.85714286 1. 1. 1. 0.875 ] mean value: 0.9482142857142857 key: train_recall value: [0.95588235 0.95588235 0.95652174 0.97101449 0.95652174 0.97101449 0.97058824 0.94117647 0.97058824 0.97058824] mean value: 0.9619778346121057 key: test_roc_auc value: [0.9375 0.8125 0.8125 1. 0.9375 0.86607143 0.85714286 1. 0.85714286 0.79464286] mean value: 0.8875000000000001 key: train_roc_auc value: [0.93382353 0.94852941 0.9488491 0.94874254 0.94149616 0.96344842 0.94906223 0.92710997 0.96355499 0.93456948] mean value: 0.9459185848252345 key: test_jcc value: [0.875 0.7 0.7 1. 0.875 0.75 0.8 1. 0.8 0.7 ] mean value: 0.82 key: train_jcc value: [0.87837838 0.90277778 0.90410959 0.90540541 0.89189189 0.93055556 0.90410959 0.86486486 0.92957746 0.88 ] mean value: 0.8991670516744799 MCC on Blind test: 0.06 Accuracy on Blind test: 0.67 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.01616359 0.01377511 0.01260805 0.01199722 0.01317811 0.01211739 0.01303506 0.01296759 0.01235151 0.01292706] mean value: 0.013112068176269531 key: score_time value: [0.01072264 0.00871634 0.00817013 0.00809884 0.00805712 0.0079968 0.00806427 0.00803137 0.00803447 0.00811362] mean value: 0.008400559425354004 key: test_mcc value: [0.8819171 0.5 0.37796447 0.73214286 0.87287156 0.60714286 0.60714286 0.60714286 0.64465837 0.6000992 ] mean value: 0.6431082135582106 key: train_mcc value: [0.77949606 0.80961181 0.82629176 0.78182997 0.81031543 0.82480818 0.75186529 0.81092683 0.82614456 0.79560955] mean value: 0.8016899442942331 key: test_accuracy value: [0.9375 0.75 0.66666667 0.86666667 0.93333333 0.8 0.8 0.8 0.8 0.8 ] mean value: 0.8154166666666667 key: train_accuracy value: [0.88970588 0.90441176 0.91240876 0.89051095 0.90510949 0.91240876 0.87591241 0.90510949 0.91240876 0.89781022] mean value: 0.9005796479175612 key: test_fscore value: [0.93333333 0.75 0.70588235 0.85714286 0.92307692 0.8 0.8 0.8 0.84210526 0.82352941] mean value: 0.823507014141689 key: train_fscore value: [0.88888889 0.90225564 0.91044776 0.88888889 0.90510949 0.91304348 0.87407407 0.90225564 0.90909091 0.89705882] mean value: 0.8991113591173656 key: test_precision value: [1. 0.75 0.6 0.85714286 1. 0.75 0.85714286 0.85714286 0.72727273 0.77777778] mean value: 0.8176479076479076 key: train_precision value: [0.89552239 0.92307692 0.93846154 0.90909091 0.91176471 0.91304348 0.88059701 0.92307692 0.9375 0.89705882] mean value: 0.9129192704364003 key: test_recall value: [0.875 0.75 0.85714286 0.85714286 0.85714286 0.85714286 0.75 0.75 1. 0.875 ] mean value: 0.8428571428571429 key: train_recall value: [0.88235294 0.88235294 0.88405797 0.86956522 0.89855072 0.91304348 0.86764706 0.88235294 0.88235294 0.89705882] mean value: 0.8859335038363171 key: test_roc_auc value: [0.9375 0.75 0.67857143 0.86607143 0.92857143 0.80357143 0.80357143 0.80357143 0.78571429 0.79464286] mean value: 0.8151785714285714 key: train_roc_auc value: [0.88970588 0.90441176 0.91261722 0.89066496 0.90515772 0.91240409 0.87585251 0.90494459 0.91219096 0.89780477] mean value: 0.9005754475703325 key: test_jcc value: [0.875 0.6 0.54545455 0.75 0.85714286 0.66666667 0.66666667 0.66666667 0.72727273 0.7 ] mean value: 0.705487012987013 key: train_jcc value: [0.8 0.82191781 0.83561644 0.8 0.82666667 0.84 0.77631579 0.82191781 0.83333333 0.81333333] mean value: 0.8169101177601538 MCC on Blind test: 0.12 Accuracy on Blind test: 0.66 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.37280297 0.37843227 0.38014102 0.37847543 0.37922144 0.38759017 0.37933087 0.39223146 0.38670659 0.38647294] mean value: 0.38214051723480225 key: score_time value: [0.0084753 0.00828695 0.00884271 0.00918055 0.00932026 0.00898337 0.00927162 0.00885415 0.00936317 0.00934863] mean value: 0.008992671966552734 key: test_mcc value: [1. 0.77459667 0.66143783 0.76376262 0.73214286 0.60714286 0.75592895 0.87287156 0.75592895 0.6000992 ] mean value: 0.7523911478249176 key: train_mcc value: [0.94158382 1. 0.95629932 0.94199209 0.95629932 0.98550418 0.95713391 1. 1. 1. ] mean value: 0.9738812635764046 key: test_accuracy value: [1. 0.875 0.8 0.86666667 0.86666667 0.8 0.86666667 0.93333333 0.86666667 0.8 ] mean value: 0.8675 key: train_accuracy value: [0.97058824 1. 0.97810219 0.97080292 0.97810219 0.99270073 0.97810219 1. 1. 1. ] mean value: 0.986839845427222 key: test_fscore value: [1. 0.88888889 0.82352941 0.875 0.85714286 0.8 0.88888889 0.94117647 0.88888889 0.82352941] mean value: 0.8787044817927171 key: train_fscore value: [0.97101449 1. 0.97841727 0.97142857 0.97841727 0.99280576 0.97841727 1. 1. 1. ] mean value: 0.9870500618139029 key: test_precision value: [1. 0.8 0.7 0.77777778 0.85714286 0.75 0.8 0.88888889 0.8 0.77777778] mean value: 0.8151587301587302 key: train_precision value: [0.95714286 1. 0.97142857 0.95774648 0.97142857 0.98571429 0.95774648 1. 1. 1. ] mean value: 0.9801207243460764 key: test_recall value: [1. 1. 1. 1. 0.85714286 0.85714286 1. 1. 1. 0.875 ] mean value: 0.9589285714285715 key: train_recall value: [0.98529412 1. 0.98550725 0.98550725 0.98550725 1. 1. 1. 1. 1. ] mean value: 0.9941815856777494 key: test_roc_auc value: [1. 0.875 0.8125 0.875 0.86607143 0.80357143 0.85714286 0.92857143 0.85714286 0.79464286] mean value: 0.8669642857142857 key: train_roc_auc value: [0.97058824 1. 0.97804774 0.9706948 0.97804774 0.99264706 0.97826087 1. 1. 1. ] mean value: 0.9868286445012788 key: test_jcc value: [1. 0.8 0.7 0.77777778 0.75 0.66666667 0.8 0.88888889 0.8 0.7 ] mean value: 0.7883333333333333 key: train_jcc value: [0.94366197 1. 0.95774648 0.94444444 0.95774648 0.98571429 0.95774648 1. 1. 1. ] mean value: 0.9747060138609435 MCC on Blind test: 0.0 Accuracy on Blind test: 0.68 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.00959182 0.00908065 0.00727654 0.0070405 0.0074923 0.00699615 0.00736642 0.00702 0.00749421 0.0074172 ] mean value: 0.0076775789260864254 key: score_time value: [0.01065612 0.01025677 0.00826311 0.0082767 0.00856185 0.00839043 0.00823283 0.00838685 0.00851226 0.00863767] mean value: 0.008817458152770996 key: test_mcc value: [0.77459667 0.37796447 0.49099025 0.37796447 0.21821789 0.49099025 0.18898224 0.46428571 0.64465837 0.20044593] mean value: 0.42290962650028463 key: train_mcc value: [0.57208135 0.54899485 0.52400868 0.47754676 0.56162481 0.60455208 0.60096088 0.6802431 0.57604541 0.66161034] mean value: 0.5807668254236807 key: test_accuracy value: [0.875 0.6875 0.73333333 0.66666667 0.6 0.73333333 0.6 0.73333333 0.8 0.6 ] mean value: 0.7029166666666666 key: train_accuracy value: [0.77205882 0.76470588 0.74452555 0.72992701 0.76642336 0.79562044 0.78832117 0.83211679 0.77372263 0.81751825] mean value: 0.7784939888364105 key: test_fscore value: [0.88888889 0.66666667 0.75 0.70588235 0.625 0.75 0.66666667 0.75 0.84210526 0.7 ] mean value: 0.7345209838321294 key: train_fscore value: [0.80254777 0.79220779 0.78527607 0.76433121 0.8 0.77419355 0.81290323 0.80991736 0.80254777 0.83870968] mean value: 0.7982634424404584 key: test_precision value: [0.8 0.71428571 0.66666667 0.6 0.55555556 0.66666667 0.6 0.75 0.72727273 0.58333333] mean value: 0.6663780663780664 key: train_precision value: [0.70786517 0.70930233 0.68085106 0.68181818 0.7032967 0.87272727 0.72413793 0.9245283 0.70786517 0.74712644] mean value: 0.7459518554034876 key: test_recall value: [1. 0.625 0.85714286 0.85714286 0.71428571 0.85714286 0.75 0.75 1. 0.875 ] mean value: 0.8285714285714285 key: train_recall value: [0.92647059 0.89705882 0.92753623 0.86956522 0.92753623 0.69565217 0.92647059 0.72058824 0.92647059 0.95588235] mean value: 0.8773231031543052 key: test_roc_auc value: [0.875 0.6875 0.74107143 0.67857143 0.60714286 0.74107143 0.58928571 0.73214286 0.78571429 0.58035714] mean value: 0.7017857142857142 key: train_roc_auc value: [0.77205882 0.76470588 0.74317988 0.72890026 0.7652387 0.7963555 0.78932225 0.83130861 0.7748295 0.81852089] mean value: 0.7784420289855073 key: test_jcc value: [0.8 0.5 0.6 0.54545455 0.45454545 0.6 0.5 0.6 0.72727273 0.53846154] mean value: 0.5865734265734266 key: train_jcc value: [0.67021277 0.65591398 0.64646465 0.6185567 0.66666667 0.63157895 0.68478261 0.68055556 0.67021277 0.72222222] mean value: 0.6647166858413609 MCC on Blind test: 0.02 Accuracy on Blind test: 0.47 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00780797 0.00748897 0.00779843 0.00788808 0.007725 0.00771189 0.00735712 0.00735378 0.0075953 0.00707674] mean value: 0.007580327987670899 key: score_time value: [0.00868344 0.00823379 0.00862813 0.00852036 0.00870013 0.00808978 0.00818658 0.0080483 0.00845337 0.0080812 ] mean value: 0.008362507820129395 key: test_mcc value: [0.25 0.25819889 0.07142857 0.33928571 0.46428571 0.13363062 0.33928571 0.46428571 0.33928571 0.49099025] mean value: 0.3150676906591499 key: train_mcc value: [0.48788604 0.49441323 0.48933032 0.47900717 0.52059257 0.46076782 0.4312221 0.41698711 0.44522592 0.43208129] mean value: 0.46575135687893415 key: test_accuracy value: [0.625 0.625 0.53333333 0.66666667 0.73333333 0.53333333 0.66666667 0.73333333 0.66666667 0.73333333] mean value: 0.6516666666666666 key: train_accuracy value: [0.74264706 0.74264706 0.74452555 0.73722628 0.75912409 0.72992701 0.71532847 0.7080292 0.72262774 0.71532847] mean value: 0.7317410905968227 key: test_fscore value: [0.625 0.57142857 0.53333333 0.66666667 0.71428571 0.63157895 0.66666667 0.75 0.66666667 0.71428571] mean value: 0.6539912280701754 key: train_fscore value: [0.75524476 0.76510067 0.75177305 0.75675676 0.77241379 0.74125874 0.71942446 0.71428571 0.72058824 0.72340426] mean value: 0.7420250432480666 key: test_precision value: [0.625 0.66666667 0.5 0.625 0.71428571 0.5 0.71428571 0.75 0.71428571 0.83333333] mean value: 0.6642857142857143 key: train_precision value: [0.72 0.7037037 0.73611111 0.70886076 0.73684211 0.71621622 0.70422535 0.69444444 0.72058824 0.69863014] mean value: 0.7139622064625399 key: test_recall value: [0.625 0.5 0.57142857 0.71428571 0.71428571 0.85714286 0.625 0.75 0.625 0.625 ] mean value: 0.6607142857142857 key: train_recall value: [0.79411765 0.83823529 0.76811594 0.8115942 0.8115942 0.76811594 0.73529412 0.73529412 0.72058824 0.75 ] mean value: 0.7732949701619778 key: test_roc_auc value: [0.625 0.625 0.53571429 0.66964286 0.73214286 0.55357143 0.66964286 0.73214286 0.66964286 0.74107143] mean value: 0.6553571428571429 key: train_roc_auc value: [0.74264706 0.74264706 0.74435209 0.73667945 0.75873828 0.72964621 0.71547315 0.70822677 0.72261296 0.71557971] mean value: 0.7316602728047741 key: test_jcc value: [0.45454545 0.4 0.36363636 0.5 0.55555556 0.46153846 0.5 0.6 0.5 0.55555556] mean value: 0.4890831390831391 key: train_jcc value: [0.60674157 0.61956522 0.60227273 0.60869565 0.62921348 0.58888889 0.56179775 0.55555556 0.56321839 0.56666667] mean value: 0.5902615907742418 MCC on Blind test: 0.1 Accuracy on Blind test: 0.58 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00691152 0.00645924 0.00683975 0.00714922 0.00726128 0.00734305 0.00735426 0.00752354 0.00730586 0.00730991] mean value: 0.0071457624435424805 key: score_time value: [0.00938153 0.00884461 0.00903273 0.0093987 0.00931835 0.00955343 0.00963545 0.0096395 0.00957155 0.00971961] mean value: 0.009409546852111816 key: test_mcc value: [ 0.62994079 0.5 0.49099025 0.6000992 0.49099025 0.32732684 -0.02620712 0.46428571 0.32732684 0.32732684] mean value: 0.4132079591989289 key: train_mcc value: [0.69731096 0.6918501 0.75815907 0.66971076 0.69510727 0.70910029 0.6523446 0.71313464 0.68163703 0.66616982] mean value: 0.6934524542628495 key: test_accuracy value: [0.8125 0.75 0.73333333 0.8 0.73333333 0.66666667 0.46666667 0.73333333 0.66666667 0.66666667] mean value: 0.7029166666666666 key: train_accuracy value: [0.84558824 0.84558824 0.87591241 0.83211679 0.84671533 0.8540146 0.82481752 0.8540146 0.83941606 0.83211679] mean value: 0.8450300558179475 key: test_fscore value: [0.82352941 0.75 0.75 0.76923077 0.75 0.61538462 0.2 0.75 0.70588235 0.70588235] mean value: 0.6819909502262443 key: train_fscore value: [0.85517241 0.84892086 0.88435374 0.84353741 0.85314685 0.85915493 0.83098592 0.86111111 0.84507042 0.83687943] mean value: 0.8518333098052753 key: test_precision value: [0.77777778 0.75 0.66666667 0.83333333 0.66666667 0.66666667 0.5 0.75 0.66666667 0.66666667] mean value: 0.6944444444444444 key: train_precision value: [0.80519481 0.83098592 0.83333333 0.79487179 0.82432432 0.83561644 0.7972973 0.81578947 0.81081081 0.80821918] mean value: 0.815644337144789 key: test_recall value: [0.875 0.75 0.85714286 0.71428571 0.85714286 0.57142857 0.125 0.75 0.75 0.75 ] mean value: 0.7 key: train_recall value: [0.91176471 0.86764706 0.94202899 0.89855072 0.88405797 0.88405797 0.86764706 0.91176471 0.88235294 0.86764706] mean value: 0.8917519181585678 key: test_roc_auc value: [0.8125 0.75 0.74107143 0.79464286 0.74107143 0.66071429 0.49107143 0.73214286 0.66071429 0.66071429] mean value: 0.7044642857142858 key: train_roc_auc value: [0.84558824 0.84558824 0.87542626 0.8316283 0.84644075 0.85379369 0.82512788 0.85443308 0.8397272 0.83237425] mean value: 0.8450127877237852 key: test_jcc value: [0.7 0.6 0.6 0.625 0.6 0.44444444 0.11111111 0.6 0.54545455 0.54545455] mean value: 0.5371464646464646 key: train_jcc value: [0.74698795 0.7375 0.79268293 0.72941176 0.74390244 0.75308642 0.71084337 0.75609756 0.73170732 0.7195122 ] mean value: 0.7421731948784563 MCC on Blind test: 0.06 Accuracy on Blind test: 0.68 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.00869989 0.00870991 0.00899339 0.00884175 0.00880337 0.00869131 0.00819468 0.00880289 0.00880551 0.00904202] mean value: 0.008758473396301269 key: score_time value: [0.00901675 0.00891495 0.00872993 0.00879502 0.00886464 0.00874734 0.00867009 0.00894928 0.00889111 0.00890279] mean value: 0.008848190307617188 key: test_mcc value: [0.75 0.62994079 0.49099025 0.76376262 0.60714286 0.60714286 0.76376262 0.60714286 0.75592895 0.73214286] mean value: 0.6707956647621525 key: train_mcc value: [0.85294118 0.88235294 0.86868474 0.81027501 0.8687127 0.85434012 0.89869927 0.89869927 0.89863497 0.85440207] mean value: 0.8687742253690149 key: test_accuracy value: [0.875 0.8125 0.73333333 0.86666667 0.8 0.8 0.86666667 0.8 0.86666667 0.86666667] mean value: 0.82875 key: train_accuracy value: [0.92647059 0.94117647 0.93430657 0.90510949 0.93430657 0.9270073 0.94890511 0.94890511 0.94890511 0.9270073 ] mean value: 0.9342099613568055 key: test_fscore value: [0.875 0.8 0.75 0.875 0.8 0.8 0.85714286 0.8 0.88888889 0.875 ] mean value: 0.8321031746031746 key: train_fscore value: [0.92647059 0.94117647 0.9352518 0.90647482 0.93430657 0.92857143 0.94964029 0.94964029 0.94736842 0.92753623] mean value: 0.9346436903919317 key: test_precision value: [0.875 0.85714286 0.66666667 0.77777778 0.75 0.75 1. 0.85714286 0.8 0.875 ] mean value: 0.8208730158730159 key: train_precision value: [0.92647059 0.94117647 0.92857143 0.9 0.94117647 0.91549296 0.92957746 0.92957746 0.96923077 0.91428571] mean value: 0.929555932882362 key: test_recall value: [0.875 0.75 0.85714286 1. 0.85714286 0.85714286 0.75 0.75 1. 0.875 ] mean value: 0.8571428571428571 key: train_recall value: [0.92647059 0.94117647 0.94202899 0.91304348 0.92753623 0.94202899 0.97058824 0.97058824 0.92647059 0.94117647] mean value: 0.9401108269394715 key: test_roc_auc value: [0.875 0.8125 0.74107143 0.875 0.80357143 0.80357143 0.875 0.80357143 0.85714286 0.86607143] mean value: 0.83125 key: train_roc_auc value: [0.92647059 0.94117647 0.93424979 0.90505115 0.93435635 0.92689685 0.94906223 0.94906223 0.94874254 0.92710997] mean value: 0.9342178175618073 key: test_jcc value: [0.77777778 0.66666667 0.6 0.77777778 0.66666667 0.66666667 0.75 0.66666667 0.8 0.77777778] mean value: 0.715 key: train_jcc value: [0.8630137 0.88888889 0.87837838 0.82894737 0.87671233 0.86666667 0.90410959 0.90410959 0.9 0.86486486] mean value: 0.8775691372699304 MCC on Blind test: 0.13 Accuracy on Blind test: 0.69 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.47774863 0.47219229 0.59577179 0.48569918 0.47681928 0.47980237 0.52681231 0.59629416 0.46796393 0.48238492] mean value: 0.506148886680603 key: score_time value: [0.01098704 0.01345611 0.01107907 0.01333833 0.01326942 0.01334047 0.01134348 0.01388001 0.01111221 0.01353312] mean value: 0.012533926963806152 key: test_mcc value: [1. 0.77459667 0.37796447 0.60714286 0.76376262 0.60714286 0.46428571 0.60714286 0.75592895 0.73214286] mean value: 0.6690109846952281 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.875 0.66666667 0.8 0.86666667 0.8 0.73333333 0.8 0.86666667 0.86666667] mean value: 0.8275 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.88888889 0.70588235 0.8 0.875 0.8 0.75 0.8 0.88888889 0.875 ] mean value: 0.8383660130718954 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.8 0.6 0.75 0.77777778 0.75 0.75 0.85714286 0.8 0.875 ] mean value: 0.7959920634920635 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 0.85714286 0.85714286 1. 0.85714286 0.75 0.75 1. 0.875 ] mean value: 0.8946428571428571 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.875 0.67857143 0.80357143 0.875 0.80357143 0.73214286 0.80357143 0.85714286 0.86607143] mean value: 0.8294642857142858 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.8 0.54545455 0.66666667 0.77777778 0.66666667 0.6 0.66666667 0.8 0.77777778] mean value: 0.7301010101010101 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.06 Accuracy on Blind test: 0.69 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01036239 0.00926948 0.00805068 0.00823069 0.00814795 0.00783706 0.00812101 0.00809956 0.00801969 0.00834656] mean value: 0.008448505401611328 key: score_time value: [0.01827431 0.00897479 0.00929928 0.00881314 0.00856185 0.00861549 0.00854754 0.00869775 0.00881147 0.00856495] mean value: 0.009716057777404785 key: test_mcc value: [1. 0.8819171 1. 1. 0.875 0.87287156 0.87287156 0.75592895 0.87287156 0.875 ] mean value: 0.9006460732538559 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.9375 1. 1. 0.93333333 0.93333333 0.93333333 0.86666667 0.93333333 0.93333333] mean value: 0.9470833333333334 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.94117647 1. 1. 0.93333333 0.92307692 0.94117647 0.88888889 0.94117647 0.93333333] mean value: 0.9502161890397185 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.88888889 1. 1. 0.875 1. 0.88888889 0.8 0.88888889 1. ] mean value: 0.9341666666666667 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 0.85714286 1. 1. 1. 0.875 ] mean value: 0.9732142857142857 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.9375 1. 1. 0.9375 0.92857143 0.92857143 0.85714286 0.92857143 0.9375 ] mean value: 0.9455357142857143 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.88888889 1. 1. 0.875 0.85714286 0.88888889 0.8 0.88888889 0.875 ] mean value: 0.9073809523809524 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.13 Accuracy on Blind test: 0.85 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.08768535 0.08900571 0.08186841 0.087538 0.08398795 0.08419561 0.083009 0.08226275 0.08182836 0.0807085 ] mean value: 0.08420896530151367 key: score_time value: [0.01827073 0.01797938 0.01798153 0.01794004 0.01733184 0.01720476 0.01783872 0.01791549 0.017555 0.01719594] mean value: 0.017721343040466308 key: test_mcc value: [1. 0.75 0.73214286 1. 0.875 0.73214286 0.60714286 0.76376262 0.87287156 0.76376262] mean value: 0.8096825364024487 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.875 0.86666667 1. 0.93333333 0.86666667 0.8 0.86666667 0.93333333 0.86666667] mean value: 0.9008333333333334 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.875 0.85714286 1. 0.93333333 0.85714286 0.8 0.85714286 0.94117647 0.85714286] mean value: 0.8978081232492997 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.875 0.85714286 1. 0.875 0.85714286 0.85714286 1. 0.88888889 1. ] mean value: 0.921031746031746 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.875 0.85714286 1. 1. 0.85714286 0.75 0.75 1. 0.75 ] mean value: 0.8839285714285714 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.875 0.86607143 1. 0.9375 0.86607143 0.80357143 0.875 0.92857143 0.875 ] mean value: 0.9026785714285714 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.77777778 0.75 1. 0.875 0.75 0.66666667 0.75 0.88888889 0.75 ] mean value: 0.8208333333333333 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.1 Accuracy on Blind test: 0.81 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00700188 0.00755453 0.00768661 0.00771928 0.00732183 0.00702024 0.00708175 0.00722766 0.00722528 0.00710702] mean value: 0.007294607162475586 key: score_time value: [0.00804567 0.00840735 0.0089283 0.0083468 0.0084908 0.00807238 0.00812387 0.0082643 0.00818753 0.00800323] mean value: 0.00828702449798584 key: test_mcc value: [0.8819171 0.8819171 0.73214286 1. 0.76376262 0.46428571 0.49099025 0.60714286 0.875 0.87287156] mean value: 0.7570030065748747 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9375 0.9375 0.86666667 1. 0.86666667 0.73333333 0.73333333 0.8 0.93333333 0.93333333] mean value: 0.8741666666666666 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94117647 0.93333333 0.85714286 1. 0.875 0.71428571 0.71428571 0.8 0.93333333 0.94117647] mean value: 0.8709733893557423 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88888889 1. 0.85714286 1. 0.77777778 0.71428571 0.83333333 0.85714286 1. 0.88888889] mean value: 0.8817460317460317 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.875 0.85714286 1. 1. 0.71428571 0.625 0.75 0.875 1. ] mean value: 0.8696428571428572 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9375 0.9375 0.86607143 1. 0.875 0.73214286 0.74107143 0.80357143 0.9375 0.92857143] mean value: 0.8758928571428571 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88888889 0.875 0.75 1. 0.77777778 0.55555556 0.55555556 0.66666667 0.875 0.88888889] mean value: 0.7833333333333333 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.73 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [1.0097611 1.00532484 1.01266861 1.00928307 1.0162096 1.00374866 1.01397824 1.01551723 1.02490425 1.04691529] mean value: 1.0158310890197755 key: score_time value: [0.15017748 0.09301543 0.09229612 0.09591055 0.09012294 0.09085989 0.09002423 0.09411788 0.09723639 0.09498525] mean value: 0.09887461662292481 key: test_mcc value: [1. 0.8819171 0.76376262 1. 0.875 0.73214286 0.60714286 0.73214286 0.87287156 0.875 ] mean value: 0.8339979851886711 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.9375 0.86666667 1. 0.93333333 0.86666667 0.8 0.86666667 0.93333333 0.93333333] mean value: 0.9137500000000001 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.94117647 0.875 1. 0.93333333 0.85714286 0.8 0.875 0.94117647 0.93333333] mean value: 0.9156162464985994 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.88888889 0.77777778 1. 0.875 0.85714286 0.85714286 0.875 0.88888889 1. ] mean value: 0.901984126984127 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 0.85714286 0.75 0.875 1. 0.875 ] mean value: 0.9357142857142857 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.9375 0.875 1. 0.9375 0.86607143 0.80357143 0.86607143 0.92857143 0.9375 ] mean value: 0.9151785714285714 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.88888889 0.77777778 1. 0.875 0.75 0.66666667 0.77777778 0.88888889 0.875 ] mean value: 0.85 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.83 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.84915781 0.96325994 0.88296103 0.89262009 0.85590529 0.8617866 0.86219358 0.88421559 0.87171268 0.90677834] mean value: 0.8830590963363647 key: score_time value: [0.23183656 0.20871425 0.23059011 0.22569108 0.22029448 0.24542952 0.22994447 0.24367285 0.24643469 0.23650432] mean value: 0.23191123008728026 key: test_mcc value: [1. 0.75 0.76376262 1. 0.73214286 0.60714286 0.60714286 0.73214286 0.87287156 0.875 ] mean value: 0.794020560534137 key: train_mcc value: [0.98540068 0.94117647 0.98550418 0.97120941 0.94160273 0.98550418 0.98550725 0.98550725 0.97122151 0.97122151] mean value: 0.9723855158091337 key: test_accuracy value: [1. 0.875 0.86666667 1. 0.86666667 0.8 0.8 0.86666667 0.93333333 0.93333333] mean value: 0.8941666666666667 key: train_accuracy value: [0.99264706 0.97058824 0.99270073 0.98540146 0.97080292 0.99270073 0.99270073 0.99270073 0.98540146 0.98540146] mean value: 0.986104551309575 key: test_fscore value: [1. 0.875 0.875 1. 0.85714286 0.8 0.8 0.875 0.94117647 0.93333333] mean value: 0.8956652661064426 key: train_fscore value: [0.99270073 0.97058824 0.99280576 0.98571429 0.97101449 0.99280576 0.99270073 0.99270073 0.98550725 0.98550725] mean value: 0.9862045207088039 key: test_precision value: [1. 0.875 0.77777778 1. 0.85714286 0.75 0.85714286 0.875 0.88888889 1. ] mean value: 0.888095238095238 key: train_precision value: [0.98550725 0.97058824 0.98571429 0.97183099 0.97101449 0.98571429 0.98550725 0.98550725 0.97142857 0.97142857] mean value: 0.9784241167379383 key: test_recall value: [1. 0.875 1. 1. 0.85714286 0.85714286 0.75 0.875 1. 0.875 ] mean value: 0.9089285714285714 key: train_recall value: [1. 0.97058824 1. 1. 0.97101449 1. 1. 1. 1. 1. ] mean value: 0.994160272804774 key: test_roc_auc value: [1. 0.875 0.875 1. 0.86607143 0.80357143 0.80357143 0.86607143 0.92857143 0.9375 ] mean value: 0.8955357142857143 key: train_roc_auc value: [0.99264706 0.97058824 0.99264706 0.98529412 0.97080136 0.99264706 0.99275362 0.99275362 0.98550725 0.98550725] mean value: 0.986114663256607 key: test_jcc value: [1. 0.77777778 0.77777778 1. 0.75 0.66666667 0.66666667 0.77777778 0.88888889 0.875 ] mean value: 0.8180555555555555 key: train_jcc value: [0.98550725 0.94285714 0.98571429 0.97183099 0.94366197 0.98571429 0.98550725 0.98550725 0.97142857 0.97142857] mean value: 0.9729157554019771 MCC on Blind test: 0.1 Accuracy on Blind test: 0.8 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01659274 0.00693059 0.00675678 0.0067389 0.00669909 0.00674319 0.00680494 0.00672269 0.00675416 0.00671124] mean value: 0.007745432853698731 key: score_time value: [0.01080561 0.00839496 0.00837088 0.00777602 0.00775194 0.00776267 0.00777245 0.00776482 0.00776577 0.00774288] mean value: 0.008190798759460449 key: test_mcc value: [0.25 0.25819889 0.07142857 0.33928571 0.46428571 0.13363062 0.33928571 0.46428571 0.33928571 0.49099025] mean value: 0.3150676906591499 key: train_mcc value: [0.48788604 0.49441323 0.48933032 0.47900717 0.52059257 0.46076782 0.4312221 0.41698711 0.44522592 0.43208129] mean value: 0.46575135687893415 key: test_accuracy value: [0.625 0.625 0.53333333 0.66666667 0.73333333 0.53333333 0.66666667 0.73333333 0.66666667 0.73333333] mean value: 0.6516666666666666 key: train_accuracy value: [0.74264706 0.74264706 0.74452555 0.73722628 0.75912409 0.72992701 0.71532847 0.7080292 0.72262774 0.71532847] mean value: 0.7317410905968227 key: test_fscore value: [0.625 0.57142857 0.53333333 0.66666667 0.71428571 0.63157895 0.66666667 0.75 0.66666667 0.71428571] mean value: 0.6539912280701754 key: train_fscore value: [0.75524476 0.76510067 0.75177305 0.75675676 0.77241379 0.74125874 0.71942446 0.71428571 0.72058824 0.72340426] mean value: 0.7420250432480666 key: test_precision value: [0.625 0.66666667 0.5 0.625 0.71428571 0.5 0.71428571 0.75 0.71428571 0.83333333] mean value: 0.6642857142857143 key: train_precision value: [0.72 0.7037037 0.73611111 0.70886076 0.73684211 0.71621622 0.70422535 0.69444444 0.72058824 0.69863014] mean value: 0.7139622064625399 key: test_recall value: [0.625 0.5 0.57142857 0.71428571 0.71428571 0.85714286 0.625 0.75 0.625 0.625 ] mean value: 0.6607142857142857 key: train_recall value: [0.79411765 0.83823529 0.76811594 0.8115942 0.8115942 0.76811594 0.73529412 0.73529412 0.72058824 0.75 ] mean value: 0.7732949701619778 key: test_roc_auc value: [0.625 0.625 0.53571429 0.66964286 0.73214286 0.55357143 0.66964286 0.73214286 0.66964286 0.74107143] mean value: 0.6553571428571429 key: train_roc_auc value: [0.74264706 0.74264706 0.74435209 0.73667945 0.75873828 0.72964621 0.71547315 0.70822677 0.72261296 0.71557971] mean value: 0.7316602728047741 key: test_jcc value: [0.45454545 0.4 0.36363636 0.5 0.55555556 0.46153846 0.5 0.6 0.5 0.55555556] mean value: 0.4890831390831391 key: train_jcc value: [0.60674157 0.61956522 0.60227273 0.60869565 0.62921348 0.58888889 0.56179775 0.55555556 0.56321839 0.56666667] mean value: 0.5902615907742418 MCC on Blind test: 0.1 Accuracy on Blind test: 0.58 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.05043268 0.03555036 0.05953455 0.03447628 0.03820968 0.0348382 0.03491855 0.03489041 0.03481722 0.03513861] mean value: 0.03928065299987793 key: score_time value: [0.01032662 0.01029825 0.0103786 0.01031733 0.01061702 0.01034212 0.01034379 0.01036835 0.01033378 0.01031613] mean value: 0.010364198684692382 key: test_mcc value: [1. 0.8819171 1. 1. 0.875 1. 0.87287156 1. 0.87287156 0.875 ] mean value: 0.9377660225576135 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.9375 1. 1. 0.93333333 1. 0.93333333 1. 0.93333333 0.93333333] mean value: 0.9670833333333333 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.94117647 1. 1. 0.93333333 1. 0.94117647 1. 0.94117647 0.93333333] mean value: 0.9690196078431372 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.88888889 1. 1. 0.875 1. 0.88888889 1. 0.88888889 1. ] mean value: 0.9541666666666666 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.875] mean value: 0.9875 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.9375 1. 1. 0.9375 1. 0.92857143 1. 0.92857143 0.9375 ] mean value: 0.9669642857142857 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.88888889 1. 1. 0.875 1. 0.88888889 1. 0.88888889 0.875 ] mean value: 0.9416666666666667 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.12 Accuracy on Blind test: 0.84 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.01003599 0.01160932 0.01194644 0.01213264 0.0120163 0.01201224 0.01192141 0.01221108 0.01222897 0.01220608] mean value: 0.011832046508789062 key: score_time value: [0.01034403 0.01014495 0.01055765 0.01064253 0.01057649 0.01061487 0.01058221 0.01066971 0.01062822 0.01060128] mean value: 0.01053619384765625 key: test_mcc value: [0.8819171 0.77459667 0.49099025 1. 0.73214286 0.73214286 0.87287156 0.76376262 0.75592895 0.75592895] mean value: 0.7760281809053229 key: train_mcc value: [0.89949371 0.91533482 0.90246052 0.91392776 0.92787101 0.95710706 0.9139999 0.92951942 0.92791659 0.92951942] mean value: 0.9217150203470457 key: test_accuracy value: [0.9375 0.875 0.73333333 1. 0.86666667 0.86666667 0.93333333 0.86666667 0.86666667 0.86666667] mean value: 0.88125 key: train_accuracy value: [0.94852941 0.95588235 0.94890511 0.95620438 0.96350365 0.97810219 0.95620438 0.96350365 0.96350365 0.96350365] mean value: 0.9597842421640189 key: test_fscore value: [0.94117647 0.88888889 0.75 1. 0.85714286 0.85714286 0.94117647 0.85714286 0.88888889 0.88888889] mean value: 0.8870448179271708 key: train_fscore value: [0.95035461 0.95774648 0.95172414 0.95774648 0.96453901 0.9787234 0.95714286 0.96453901 0.96402878 0.96453901] mean value: 0.961108376525978 key: test_precision value: [0.88888889 0.8 0.66666667 1. 0.85714286 0.85714286 0.88888889 1. 0.8 0.8 ] mean value: 0.8558730158730159 key: train_precision value: [0.91780822 0.91891892 0.90789474 0.93150685 0.94444444 0.95833333 0.93055556 0.93150685 0.94366197 0.93150685] mean value: 0.9316137728048631 key: test_recall value: [1. 1. 0.85714286 1. 0.85714286 0.85714286 1. 0.75 1. 1. ] mean value: 0.9321428571428572 key: train_recall value: [0.98529412 1. 1. 0.98550725 0.98550725 1. 0.98529412 1. 0.98529412 1. ] mean value: 0.99268968456948 key: test_roc_auc value: [0.9375 0.875 0.74107143 1. 0.86607143 0.86607143 0.92857143 0.875 0.85714286 0.85714286] mean value: 0.8803571428571428 key: train_roc_auc value: [0.94852941 0.95588235 0.94852941 0.95598892 0.96334186 0.97794118 0.95641517 0.96376812 0.96366155 0.96376812] mean value: 0.9597826086956522 key: test_jcc value: [0.88888889 0.8 0.6 1. 0.75 0.75 0.88888889 0.75 0.8 0.8 ] mean value: 0.8027777777777778 key: train_jcc value: [0.90540541 0.91891892 0.90789474 0.91891892 0.93150685 0.95833333 0.91780822 0.93150685 0.93055556 0.93150685] mean value: 0.9252355636097525 MCC on Blind test: 0.05 Accuracy on Blind test: 0.64 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.00948811 0.00723052 0.00743365 0.00692177 0.00683284 0.00688004 0.00742602 0.00687981 0.00724554 0.00696445] mean value: 0.00733027458190918 key: score_time value: [0.01050973 0.00838256 0.00804853 0.00779343 0.00792456 0.00785303 0.00853586 0.00783825 0.00832796 0.00790906] mean value: 0.008312296867370606 key: test_mcc value: [0.37796447 0.25819889 0.37796447 0.32732684 0.60714286 0.37796447 0.49099025 0.33928571 0.33928571 0.46428571] mean value: 0.3960409397159814 key: train_mcc value: [0.47243088 0.54894692 0.5182264 0.47592003 0.46076782 0.5335339 0.4599318 0.4312221 0.47473887 0.47442455] mean value: 0.4850143267959903 key: test_accuracy value: [0.6875 0.625 0.66666667 0.66666667 0.8 0.66666667 0.73333333 0.66666667 0.66666667 0.73333333] mean value: 0.6912499999999999 key: train_accuracy value: [0.73529412 0.77205882 0.75912409 0.73722628 0.72992701 0.76642336 0.72992701 0.71532847 0.73722628 0.73722628] mean value: 0.7419761700300558 key: test_fscore value: [0.66666667 0.57142857 0.70588235 0.61538462 0.8 0.70588235 0.71428571 0.66666667 0.66666667 0.75 ] mean value: 0.6862863606981253 key: train_fscore value: [0.74647887 0.7862069 0.76258993 0.75 0.74125874 0.77464789 0.72992701 0.71942446 0.73913043 0.73529412] mean value: 0.7484958346591992 key: test_precision value: [0.71428571 0.66666667 0.6 0.66666667 0.75 0.6 0.83333333 0.71428571 0.71428571 0.75 ] mean value: 0.700952380952381 key: train_precision value: [0.71621622 0.74025974 0.75714286 0.72 0.71621622 0.75342466 0.72463768 0.70422535 0.72857143 0.73529412] mean value: 0.729598826685986 key: test_recall value: [0.625 0.5 0.85714286 0.57142857 0.85714286 0.85714286 0.625 0.625 0.625 0.75 ] mean value: 0.6892857142857143 key: train_recall value: [0.77941176 0.83823529 0.76811594 0.7826087 0.76811594 0.79710145 0.73529412 0.73529412 0.75 0.73529412] mean value: 0.7689471440750213 key: test_roc_auc value: [0.6875 0.625 0.67857143 0.66071429 0.80357143 0.67857143 0.74107143 0.66964286 0.66964286 0.73214286] mean value: 0.6946428571428571 key: train_roc_auc value: [0.73529412 0.77205882 0.75905797 0.73689258 0.72964621 0.76619778 0.7299659 0.71547315 0.73731884 0.73721228] mean value: 0.7419117647058824 key: test_jcc value: [0.5 0.4 0.54545455 0.44444444 0.66666667 0.54545455 0.55555556 0.5 0.5 0.6 ] mean value: 0.5257575757575758 key: train_jcc value: [0.59550562 0.64772727 0.61627907 0.6 0.58888889 0.63218391 0.57471264 0.56179775 0.5862069 0.58139535] mean value: 0.5984697399283192 MCC on Blind test: 0.1 Accuracy on Blind test: 0.6 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00826144 0.00782967 0.00737524 0.00810552 0.00831509 0.00756001 0.00788832 0.00767446 0.00802422 0.00808358] mean value: 0.007911753654479981 key: score_time value: [0.00887156 0.00873542 0.00787902 0.00805974 0.00852108 0.00807381 0.00853586 0.00868702 0.00864601 0.00846553] mean value: 0.008447504043579102 key: test_mcc value: [0.77459667 0.5 0.47245559 0.64465837 0.73214286 0.60714286 0.64465837 0.87287156 0.64465837 0.6000992 ] mean value: 0.6493283847542592 key: train_mcc value: [0.76894131 0.91334626 0.54803747 0.87326937 0.94160273 0.83757093 0.91597649 0.88476385 0.87099729 0.88476385] mean value: 0.8439269536443883 key: test_accuracy value: [0.875 0.75 0.73333333 0.8 0.86666667 0.8 0.8 0.93333333 0.8 0.8 ] mean value: 0.8158333333333334 key: train_accuracy value: [0.875 0.95588235 0.72992701 0.93430657 0.97080292 0.91240876 0.95620438 0.94160584 0.93430657 0.94160584] mean value: 0.9152050236152856 key: test_fscore value: [0.85714286 0.75 0.66666667 0.72727273 0.85714286 0.8 0.84210526 0.94117647 0.84210526 0.82352941] mean value: 0.8107141516893839 key: train_fscore value: [0.85950413 0.95454545 0.63366337 0.93129771 0.97101449 0.92 0.95774648 0.94285714 0.93617021 0.94285714] mean value: 0.9049656133144264 key: test_precision value: [1. 0.75 0.8 1. 0.85714286 0.75 0.72727273 0.88888889 0.72727273 0.77777778] mean value: 0.8278354978354978 key: train_precision value: [0.98113208 0.984375 1. 0.98387097 0.97101449 0.85185185 0.91891892 0.91666667 0.90410959 0.91666667] mean value: 0.9428606229112457 key: test_recall value: [0.75 0.75 0.57142857 0.57142857 0.85714286 0.85714286 1. 1. 1. 0.875 ] mean value: 0.8232142857142857 key: train_recall value: [0.76470588 0.92647059 0.46376812 0.88405797 0.97101449 1. 1. 0.97058824 0.97058824 0.97058824] mean value: 0.8921781756180733 key: test_roc_auc value: [0.875 0.75 0.72321429 0.78571429 0.86607143 0.80357143 0.78571429 0.92857143 0.78571429 0.79464286] mean value: 0.8098214285714286 key: train_roc_auc value: [0.875 0.95588235 0.73188406 0.93467604 0.97080136 0.91176471 0.95652174 0.94181586 0.93456948 0.94181586] mean value: 0.9154731457800511 key: test_jcc value: [0.75 0.6 0.5 0.57142857 0.75 0.66666667 0.72727273 0.88888889 0.72727273 0.7 ] mean value: 0.6881529581529582 key: train_jcc value: [0.75362319 0.91304348 0.46376812 0.87142857 0.94366197 0.85185185 0.91891892 0.89189189 0.88 0.89189189] mean value: 0.8380079880422807 MCC on Blind test: 0.06 Accuracy on Blind test: 0.89 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00997114 0.01002455 0.00783682 0.00783896 0.00782728 0.00736952 0.00789356 0.00743914 0.00764585 0.0072844 ] mean value: 0.00811312198638916 key: score_time value: [0.01067495 0.00957513 0.00809073 0.00825882 0.0082314 0.00791526 0.00799417 0.00835299 0.00847554 0.00832438] mean value: 0.008589339256286622 key: test_mcc value: [0.77459667 0.37796447 0.36689969 0.60714286 0.49099025 0.73214286 0.6000992 0.73214286 0.75592895 0.73214286] mean value: 0.6170050660873226 key: train_mcc value: [0.72669793 0.88580789 0.78788403 0.74493056 0.77817796 0.91597649 0.92951942 0.85434012 0.86000692 0.91240409] mean value: 0.8395745411348854 key: test_accuracy value: [0.875 0.6875 0.6 0.8 0.73333333 0.86666667 0.8 0.86666667 0.86666667 0.86666667] mean value: 0.79625 key: train_accuracy value: [0.84558824 0.94117647 0.88321168 0.86861314 0.88321168 0.95620438 0.96350365 0.9270073 0.9270073 0.95620438] mean value: 0.9151728209531989 key: test_fscore value: [0.85714286 0.66666667 0.7 0.8 0.75 0.85714286 0.82352941 0.875 0.88888889 0.875 ] mean value: 0.8093370681605976 key: train_fscore value: [0.8173913 0.93846154 0.8961039 0.87837838 0.89333333 0.95454545 0.96453901 0.92537313 0.93055556 0.95588235] mean value: 0.9154563955087716 key: test_precision value: [1. 0.71428571 0.53846154 0.75 0.66666667 0.85714286 0.77777778 0.875 0.8 0.875 ] mean value: 0.7854334554334554 key: train_precision value: [1. 0.98387097 0.81176471 0.82278481 0.82716049 1. 0.93150685 0.93939394 0.88157895 0.95588235] mean value: 0.9153943066596637 key: test_recall value: [0.75 0.625 1. 0.85714286 0.85714286 0.85714286 0.875 0.875 1. 0.875 ] mean value: 0.8571428571428571 key: train_recall value: [0.69117647 0.89705882 1. 0.94202899 0.97101449 0.91304348 1. 0.91176471 0.98529412 0.95588235] mean value: 0.9267263427109974 key: test_roc_auc value: [0.875 0.6875 0.625 0.80357143 0.74107143 0.86607143 0.79464286 0.86607143 0.85714286 0.86607143] mean value: 0.7982142857142858 key: train_roc_auc value: [0.84558824 0.94117647 0.88235294 0.86807332 0.88256607 0.95652174 0.96376812 0.92689685 0.92742967 0.95620205] mean value: 0.9150575447570333 key: test_jcc value: [0.75 0.5 0.53846154 0.66666667 0.6 0.75 0.7 0.77777778 0.8 0.77777778] mean value: 0.6860683760683761 key: train_jcc value: [0.69117647 0.88405797 0.81176471 0.78313253 0.80722892 0.91304348 0.93150685 0.86111111 0.87012987 0.91549296] mean value: 0.8468644859831611 MCC on Blind test: 0.04 Accuracy on Blind test: 0.85 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.07661414 0.06280541 0.0673337 0.06278038 0.06312895 0.06578207 0.06545353 0.06469059 0.06454372 0.06606507] mean value: 0.06591975688934326 key: score_time value: [0.01440525 0.01476049 0.01512003 0.01427507 0.01462126 0.01519179 0.01491117 0.01524901 0.01458573 0.01450872] mean value: 0.01476285457611084 key: test_mcc value: [1. 0.8819171 0.76376262 0.875 0.73214286 0.87287156 0.87287156 1. 0.87287156 0.73214286] mean value: 0.8603580116631793 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.9375 0.86666667 0.93333333 0.86666667 0.93333333 0.93333333 1. 0.93333333 0.86666667] mean value: 0.9270833333333334 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.94117647 0.875 0.93333333 0.85714286 0.92307692 0.94117647 1. 0.94117647 0.875 ] mean value: 0.9287082525317819 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.88888889 0.77777778 0.875 0.85714286 1. 0.88888889 1. 0.88888889 0.875 ] mean value: 0.9051587301587302 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.85714286 0.85714286 1. 1. 1. 0.875 ] mean value: 0.9589285714285715 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.9375 0.875 0.9375 0.86607143 0.92857143 0.92857143 1. 0.92857143 0.86607143] mean value: 0.9267857142857143 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.88888889 0.77777778 0.875 0.75 0.85714286 0.88888889 1. 0.88888889 0.77777778] mean value: 0.870436507936508 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.76 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.02507782 0.02816606 0.04579067 0.0453043 0.04246235 0.02354765 0.0451479 0.02337193 0.02406144 0.03010583] mean value: 0.0333035945892334 key: score_time value: [0.0173595 0.02051401 0.03634977 0.03514004 0.01607704 0.03413272 0.02627468 0.01672935 0.02149177 0.03459334] mean value: 0.025866222381591798 key: test_mcc value: [1. 0.8819171 1. 1. 0.875 0.73214286 0.87287156 1. 0.87287156 0.875 ] mean value: 0.9109803082718992 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 0.98550725 1. 1. ] mean value: 0.9985507246376811 key: test_accuracy value: [1. 0.9375 1. 1. 0.93333333 0.86666667 0.93333333 1. 0.93333333 0.93333333] mean value: 0.95375 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 0.99270073 1. 1. ] mean value: 0.9992700729927008 key: test_fscore value: [1. 0.94117647 1. 1. 0.93333333 0.85714286 0.94117647 1. 0.94117647 0.93333333] mean value: 0.954733893557423 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 0.99270073 1. 1. ] mean value: 0.9992700729927008 key: test_precision value: [1. 0.88888889 1. 1. 0.875 0.85714286 0.88888889 1. 0.88888889 1. ] mean value: 0.9398809523809524 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 0.98550725 1. 1. ] mean value: 0.9985507246376811 key: test_recall value: [1. 1. 1. 1. 1. 0.85714286 1. 1. 1. 0.875 ] mean value: 0.9732142857142857 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.9375 1. 1. 0.9375 0.86607143 0.92857143 1. 0.92857143 0.9375 ] mean value: 0.9535714285714286 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 0.99275362 1. 1. ] mean value: 0.9992753623188406 key: test_jcc value: [1. 0.88888889 1. 1. 0.875 0.75 0.88888889 1. 0.88888889 0.875 ] mean value: 0.9166666666666666 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 0.98550725 1. 1. ] mean value: 0.9985507246376811 MCC on Blind test: 0.13 Accuracy on Blind test: 0.86 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.03250933 0.03897357 0.01707721 0.01700234 0.01710153 0.01721501 0.04007459 0.0396409 0.03977489 0.04000473] mean value: 0.02993741035461426 key: score_time value: [0.01946115 0.01910114 0.01107264 0.01099253 0.01101065 0.01094913 0.02085972 0.01942563 0.01113796 0.02103782] mean value: 0.015504837036132812 key: test_mcc value: [0.8819171 0.75 0.37796447 0.73214286 0.76376262 0.73214286 0.76376262 0.60714286 0.6000992 0.60714286] mean value: 0.6816077435069778 key: train_mcc value: [0.95598573 0.98540068 0.98550418 0.98550725 0.97080136 0.97080136 0.98550725 0.97080136 0.97120941 0.97080136] mean value: 0.9752319946905791 key: test_accuracy value: [0.9375 0.875 0.66666667 0.86666667 0.86666667 0.86666667 0.86666667 0.8 0.8 0.8 ] mean value: 0.8345833333333333 key: train_accuracy value: [0.97794118 0.99264706 0.99270073 0.99270073 0.98540146 0.98540146 0.99270073 0.98540146 0.98540146 0.98540146] mean value: 0.9875697724345213 key: test_fscore value: [0.94117647 0.875 0.70588235 0.85714286 0.875 0.85714286 0.85714286 0.8 0.82352941 0.8 ] mean value: 0.8392016806722689 key: train_fscore value: [0.97777778 0.99259259 0.99280576 0.99270073 0.98550725 0.98550725 0.99270073 0.98529412 0.98507463 0.98529412] mean value: 0.9875254940533481 key: test_precision value: [0.88888889 0.875 0.6 0.85714286 0.77777778 0.85714286 1. 0.85714286 0.77777778 0.85714286] mean value: 0.8348015873015873 key: train_precision value: [0.98507463 1. 0.98571429 1. 0.98550725 0.98550725 0.98550725 0.98529412 1. 0.98529412] mean value: 0.989789888700451 key: test_recall value: [1. 0.875 0.85714286 0.85714286 1. 0.85714286 0.75 0.75 0.875 0.75 ] mean value: 0.8571428571428571 key: train_recall value: [0.97058824 0.98529412 1. 0.98550725 0.98550725 0.98550725 1. 0.98529412 0.97058824 0.98529412] mean value: 0.9853580562659847 key: test_roc_auc value: [0.9375 0.875 0.67857143 0.86607143 0.875 0.86607143 0.875 0.80357143 0.79464286 0.80357143] mean value: 0.8375 key: train_roc_auc value: [0.97794118 0.99264706 0.99264706 0.99275362 0.98540068 0.98540068 0.99275362 0.98540068 0.98529412 0.98540068] mean value: 0.9875639386189259 key: test_jcc value: [0.88888889 0.77777778 0.54545455 0.75 0.77777778 0.75 0.75 0.66666667 0.7 0.66666667] mean value: 0.7273232323232323 key: train_jcc value: [0.95652174 0.98529412 0.98571429 0.98550725 0.97142857 0.97142857 0.98550725 0.97101449 0.97058824 0.97101449] mean value: 0.9754018998903909 MCC on Blind test: 0.06 Accuracy on Blind test: 0.66 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.10531926 0.09862638 0.1012013 0.09502292 0.09023929 0.08958817 0.09062433 0.07970119 0.08648419 0.07744169] mean value: 0.09142487049102783 key: score_time value: [0.00943542 0.00918198 0.00938845 0.00950336 0.00970459 0.00936961 0.00830388 0.00854349 0.00833607 0.00825047] mean value: 0.009001731872558594 key: test_mcc value: [0.8819171 0.8819171 1. 1. 0.875 0.87287156 0.87287156 0.87287156 0.87287156 0.875 ] mean value: 0.9005320451152271 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9375 0.9375 1. 1. 0.93333333 0.93333333 0.93333333 0.93333333 0.93333333 0.93333333] mean value: 0.9475 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94117647 0.94117647 1. 1. 0.93333333 0.92307692 0.94117647 0.94117647 0.94117647 0.93333333] mean value: 0.9495625942684767 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88888889 0.88888889 1. 1. 0.875 1. 0.88888889 0.88888889 0.88888889 1. ] mean value: 0.9319444444444445 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 0.85714286 1. 1. 1. 0.875 ] mean value: 0.9732142857142857 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9375 0.9375 1. 1. 0.9375 0.92857143 0.92857143 0.92857143 0.92857143 0.9375 ] mean value: 0.9464285714285714 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88888889 0.88888889 1. 1. 0.875 0.85714286 0.88888889 0.88888889 0.88888889 0.875 ] mean value: 0.9051587301587302 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.83 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.00895786 0.01093102 0.01075339 0.01098084 0.01611209 0.01137829 0.01158309 0.01124573 0.01122022 0.01167703] mean value: 0.01148395538330078 key: score_time value: [0.01024771 0.01018643 0.01022196 0.01059794 0.010957 0.01306653 0.01069069 0.01379347 0.01385617 0.01327443] mean value: 0.011689233779907226 key: test_mcc value: [1. 0.67419986 0.75592895 0.75592895 0.75592895 0.53452248 0.56407607 0.60714286 0.76376262 0.76376262] mean value: 0.7175253347956024 key: train_mcc value: [0.98540068 1. 1. 1. 1. 1. 0.87609014 1. 1. 1. ] mean value: 0.9861490818102587 key: test_accuracy value: [1. 0.8125 0.86666667 0.86666667 0.86666667 0.73333333 0.73333333 0.8 0.86666667 0.86666667] mean value: 0.84125 key: train_accuracy value: [0.99264706 1. 1. 1. 1. 1. 0.93430657 1. 1. 1. ] mean value: 0.9926953628166595 key: test_fscore value: [1. 0.76923077 0.83333333 0.83333333 0.83333333 0.6 0.66666667 0.8 0.85714286 0.85714286] mean value: 0.805018315018315 key: train_fscore value: [0.99259259 1. 1. 1. 1. 1. 0.92913386 1. 1. 1. ] mean value: 0.9921726450860309 key: test_precision value: [1. 1. 1. 1. 1. 1. 1. 0.85714286 1. 1. ] mean value: 0.9857142857142858 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.625 0.71428571 0.71428571 0.71428571 0.42857143 0.5 0.75 0.75 0.75 ] mean value: 0.6946428571428571 key: train_recall value: [0.98529412 1. 1. 1. 1. 1. 0.86764706 1. 1. 1. ] mean value: 0.9852941176470589 key: test_roc_auc value: [1. 0.8125 0.85714286 0.85714286 0.85714286 0.71428571 0.75 0.80357143 0.875 0.875 ] mean value: 0.8401785714285714 key: train_roc_auc value: [0.99264706 1. 1. 1. 1. 1. 0.93382353 1. 1. 1. ] mean value: 0.9926470588235294 key: test_jcc value: [1. 0.625 0.71428571 0.71428571 0.71428571 0.42857143 0.5 0.66666667 0.75 0.75 ] mean value: 0.6863095238095238 key: train_jcc value: [0.98529412 1. 1. 1. 1. 1. 0.86764706 1. 1. 1. ] mean value: 0.9852941176470589 MCC on Blind test: -0.02 Accuracy on Blind test: 0.95 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.01463962 0.00981808 0.00781131 0.00768018 0.00760412 0.00761986 0.00743604 0.00750971 0.00748968 0.00746846] mean value: 0.008507704734802246 key: score_time value: [0.01040816 0.0082798 0.00810122 0.00809526 0.00800514 0.0080471 0.00783396 0.00794506 0.00791073 0.00810385] mean value: 0.008273029327392578 key: test_mcc value: [0.8819171 0.62994079 0.37796447 0.87287156 0.73214286 0.73214286 0.75592895 1. 0.75592895 0.6000992 ] mean value: 0.7338936730461708 key: train_mcc value: [0.82388584 0.88273483 0.85440207 0.85434012 0.89863497 0.88320546 0.90025835 0.84026462 0.88360693 0.86948194] mean value: 0.8690815123547234 key: test_accuracy value: [0.9375 0.8125 0.66666667 0.93333333 0.86666667 0.86666667 0.86666667 1. 0.86666667 0.8 ] mean value: 0.8616666666666667 key: train_accuracy value: [0.91176471 0.94117647 0.9270073 0.9270073 0.94890511 0.94160584 0.94890511 0.91970803 0.94160584 0.93430657] mean value: 0.9341992271361099 key: test_fscore value: [0.93333333 0.82352941 0.70588235 0.92307692 0.85714286 0.85714286 0.88888889 1. 0.88888889 0.82352941] mean value: 0.8701414924944337 key: train_fscore value: [0.91304348 0.94202899 0.92647059 0.92857143 0.95035461 0.94202899 0.95035461 0.92086331 0.94202899 0.9352518 ] mean value: 0.9350996779361157 key: test_precision value: [1. 0.77777778 0.6 1. 0.85714286 0.85714286 0.8 1. 0.8 0.77777778] mean value: 0.846984126984127 key: train_precision value: [0.9 0.92857143 0.94029851 0.91549296 0.93055556 0.94202899 0.91780822 0.90140845 0.92857143 0.91549296] mean value: 0.9220228491043612 key: test_recall value: [0.875 0.875 0.85714286 0.85714286 0.85714286 0.85714286 1. 1. 1. 0.875 ] mean value: 0.9053571428571429 key: train_recall value: [0.92647059 0.95588235 0.91304348 0.94202899 0.97101449 0.94202899 0.98529412 0.94117647 0.95588235 0.95588235] mean value: 0.9488704177323103 key: test_roc_auc value: [0.9375 0.8125 0.67857143 0.92857143 0.86607143 0.86607143 0.85714286 1. 0.85714286 0.79464286] mean value: 0.8598214285714286 key: train_roc_auc value: [0.91176471 0.94117647 0.92710997 0.92689685 0.94874254 0.94160273 0.9491688 0.9198636 0.94170929 0.93446292] mean value: 0.9342497868712702 key: test_jcc value: [0.875 0.7 0.54545455 0.85714286 0.75 0.75 0.8 1. 0.8 0.7 ] mean value: 0.7777597402597403 key: train_jcc value: [0.84 0.89041096 0.8630137 0.86666667 0.90540541 0.89041096 0.90540541 0.85333333 0.89041096 0.87837838] mean value: 0.8783435764531655 MCC on Blind test: 0.07 Accuracy on Blind test: 0.7 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.07311392 0.06029248 0.06152678 0.05939054 0.05958748 0.05935431 0.05937529 0.05921316 0.0613966 0.06374431] mean value: 0.06169948577880859 key: score_time value: [0.00807023 0.00803971 0.00803089 0.00806856 0.00811672 0.00805664 0.00809479 0.00807714 0.00883532 0.0086627 ] mean value: 0.008205270767211914 key: test_mcc value: [0.8819171 0.62994079 0.49099025 0.87287156 0.73214286 0.73214286 0.75592895 1. 0.75592895 0.6000992 ] mean value: 0.7451962510483463 key: train_mcc value: [0.85442069 0.87000211 0.89863497 0.85434012 0.92787101 0.91277477 0.90025835 0.8555278 0.88360693 0.88668406] mean value: 0.8844120809526788 key: test_accuracy value: [0.9375 0.8125 0.73333333 0.93333333 0.86666667 0.86666667 0.86666667 1. 0.86666667 0.8 ] mean value: 0.8683333333333334 key: train_accuracy value: [0.92647059 0.93382353 0.94890511 0.9270073 0.96350365 0.95620438 0.94890511 0.9270073 0.94160584 0.94160584] mean value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:163: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:166: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy ros_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) 0.9415038643194504 key: test_fscore value: [0.94117647 0.82352941 0.75 0.92307692 0.85714286 0.85714286 0.88888889 1. 0.88888889 0.82352941] mean value: 0.8753375709258062 key: train_fscore value: [0.92857143 0.93617021 0.95035461 0.92857143 0.96453901 0.95714286 0.95035461 0.92857143 0.94202899 0.94366197] mean value: 0.9429966539911687 key: test_precision value: [0.88888889 0.77777778 0.66666667 1. 0.85714286 0.85714286 0.8 1. 0.8 0.77777778] mean value: 0.8425396825396825 key: train_precision value: [0.90277778 0.90410959 0.93055556 0.91549296 0.94444444 0.94366197 0.91780822 0.90277778 0.92857143 0.90540541] mean value: 0.9195605127329032 key: test_recall value: [1. 0.875 0.85714286 0.85714286 0.85714286 0.85714286 1. 1. 1. 0.875 ] mean value: 0.9178571428571428 key: train_recall value: [0.95588235 0.97058824 0.97101449 0.94202899 0.98550725 0.97101449 0.98529412 0.95588235 0.95588235 0.98529412] mean value: 0.9678388746803069 key: test_roc_auc value: [0.9375 0.8125 0.74107143 0.92857143 0.86607143 0.86607143 0.85714286 1. 0.85714286 0.79464286] mean value: 0.8660714285714286 key: train_roc_auc value: [0.92647059 0.93382353 0.94874254 0.92689685 0.96334186 0.95609548 0.9491688 0.92721654 0.94170929 0.94192242] mean value: 0.941538789428815 key: test_jcc value: [0.88888889 0.7 0.6 0.85714286 0.75 0.75 0.8 1. 0.8 0.7 ] mean value: 0.7846031746031746 key: train_jcc value: [0.86666667 0.88 0.90540541 0.86666667 0.93150685 0.91780822 0.90540541 0.86666667 0.89041096 0.89333333] mean value: 0.8923870171541405 MCC on Blind test: 0.06 Accuracy on Blind test: 0.66 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.01548505 0.01318932 0.01122952 0.01181602 0.01175117 0.01169848 0.01112366 0.01132369 0.0109632 0.01237416] mean value: 0.012095427513122559 key: score_time value: [0.01040673 0.00819731 0.0085001 0.00783849 0.00782514 0.00842071 0.0079546 0.00785255 0.00782204 0.00846028] mean value: 0.008327794075012208 key: test_mcc value: [0.35 0.35 0.8 1. 0.79056942 0.8 0.5 0.5 0.25819889 1. ] mean value: 0.6348768304789256 key: train_mcc value: [0.87044534 0.87035806 0.87044534 0.81836616 0.81836616 0.84412955 0.84615385 0.84615385 0.84615385 0.84615385] mean value: 0.8476726003234742 key: test_accuracy value: [0.66666667 0.66666667 0.88888889 1. 0.88888889 0.88888889 0.75 0.75 0.625 1. ] mean value: 0.8125 key: train_accuracy value: [0.93506494 0.93506494 0.93506494 0.90909091 0.90909091 0.92207792 0.92307692 0.92307692 0.92307692 0.92307692] mean value: 0.9237762237762238 key: test_fscore value: [0.66666667 0.66666667 0.88888889 1. 0.90909091 0.88888889 0.75 0.75 0.57142857 1. ] mean value: 0.8091630591630592 key: train_fscore value: [0.93506494 0.93670886 0.93506494 0.90666667 0.90666667 0.92105263 0.92307692 0.92307692 0.92307692 0.92307692] mean value: 0.9233532388109337 key: test_precision value: [0.6 0.6 0.8 1. 0.83333333 1. 0.75 0.75 0.66666667 1. ] mean value: 0.8 key: train_precision value: [0.94736842 0.925 0.94736842 0.91891892 0.91891892 0.92105263 0.92307692 0.92307692 0.92307692 0.92307692] mean value: 0.9270935003829741 key: test_recall value: [0.75 0.75 1. 1. 1. 0.8 0.75 0.75 0.5 1. ] mean value: 0.83 key: train_recall value: [0.92307692 0.94871795 0.92307692 0.89473684 0.89473684 0.92105263 0.92307692 0.92307692 0.92307692 0.92307692] mean value: 0.9197705802968961 key: test_roc_auc value: [0.675 0.675 0.9 1. 0.875 0.9 0.75 0.75 0.625 1. ] mean value: 0.8150000000000001 key: train_roc_auc value: [0.93522267 0.93488529 0.93522267 0.90890688 0.90890688 0.92206478 0.92307692 0.92307692 0.92307692 0.92307692] mean value: 0.9237516869095818 key: test_jcc value: [0.5 0.5 0.8 1. 0.83333333 0.8 0.6 0.6 0.4 1. ] mean value: 0.7033333333333334 key: train_jcc value: [0.87804878 0.88095238 0.87804878 0.82926829 0.82926829 0.85365854 0.85714286 0.85714286 0.85714286 0.85714286] mean value: 0.8577816492450638 MCC on Blind test: 0.1 Accuracy on Blind test: 0.57 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.28448343 0.26222992 0.30907059 0.30741835 0.29078984 0.30970097 0.2858026 0.2821455 0.30858493 0.30905151] mean value: 0.2949277639389038 key: score_time value: [0.00840044 0.00826621 0.00892925 0.00815058 0.00974989 0.00875688 0.00868368 0.00955057 0.00915575 0.00842237] mean value: 0.008806562423706055 key: test_mcc value: [0.1 0.35 0.8 0.79056942 1. 1. 1. 0.5 0.57735027 1. ] mean value: 0.711791968423172 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.55555556 0.66666667 0.88888889 0.88888889 1. 1. 1. 0.75 0.75 1. ] mean value: 0.85 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.5 0.66666667 0.88888889 0.90909091 1. 1. 1. 0.75 0.66666667 1. ] mean value: 0.8381313131313131 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.5 0.6 0.8 0.83333333 1. 1. 1. 0.75 1. 1. ] mean value: 0.8483333333333334 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.5 0.75 1. 1. 1. 1. 1. 0.75 0.5 1. ] mean value: 0.85 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.55 0.675 0.9 0.875 1. 1. 1. 0.75 0.75 1. ] mean value: 0.85 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.33333333 0.5 0.8 0.83333333 1. 1. 1. 0.6 0.5 1. ] mean value: 0.7566666666666667 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.05 Accuracy on Blind test: 0.63 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.0089488 0.00875568 0.0074203 0.00725079 0.00705767 0.00712848 0.00707436 0.00722599 0.00702119 0.00702357] mean value: 0.007490682601928711 key: score_time value: [0.0106101 0.01032877 0.00850701 0.00852227 0.0084753 0.00805879 0.00836706 0.00811815 0.00836396 0.00837588] mean value: 0.008772730827331543 key: test_mcc value: [ 0.39528471 0.5976143 0.5976143 0.47809144 0.15811388 0.35 0.25819889 0.37796447 0.25819889 -0.25819889] mean value: 0.3212882006354006 key: train_mcc value: [0.54521744 0.52542209 0.52542209 0.53924899 0.54085245 0.53924899 0.52790958 0.54772256 0.58722022 0.60697698] mean value: 0.5485241391056351 key: test_accuracy value: [0.66666667 0.77777778 0.77777778 0.66666667 0.55555556 0.66666667 0.625 0.625 0.625 0.375 ] mean value: 0.6361111111111111 key: train_accuracy value: [0.72727273 0.71428571 0.71428571 0.72727273 0.74025974 0.72727273 0.71794872 0.73076923 0.75641026 0.76923077] mean value: 0.7325008325008325 key: test_fscore value: [0.4 0.66666667 0.66666667 0.57142857 0.5 0.66666667 0.57142857 0.4 0.57142857 0.28571429] mean value: 0.53 key: train_fscore value: [0.63157895 0.60714286 0.60714286 0.61818182 0.65517241 0.61818182 0.60714286 0.63157895 0.6779661 0.7 ] mean value: 0.6354088618017069 key: test_precision value: [1. 1. 1. 1. 0.66666667 0.75 0.66666667 1. 0.66666667 0.33333333] mean value: 0.8083333333333333 key: train_precision value: [1. 1. 1. 1. 0.95 1. 1. 1. 1. 1. ] mean value: 0.995 key: test_recall value: [0.25 0.5 0.5 0.4 0.4 0.6 0.5 0.25 0.5 0.25] mean value: 0.415 key: train_recall value: [0.46153846 0.43589744 0.43589744 0.44736842 0.5 0.44736842 0.43589744 0.46153846 0.51282051 0.53846154] mean value: 0.4676788124156545 key: test_roc_auc value: [0.625 0.75 0.75 0.7 0.575 0.675 0.625 0.625 0.625 0.375] mean value: 0.6325 key: train_roc_auc value: [0.73076923 0.71794872 0.71794872 0.72368421 0.73717949 0.72368421 0.71794872 0.73076923 0.75641026 0.76923077] mean value: 0.732557354925776 key: test_jcc value: [0.25 0.5 0.5 0.4 0.33333333 0.5 0.4 0.25 0.4 0.16666667] mean value: 0.37 key: train_jcc value: [0.46153846 0.43589744 0.43589744 0.44736842 0.48717949 0.44736842 0.43589744 0.46153846 0.51282051 0.53846154] mean value: 0.46639676113360323 MCC on Blind test: 0.08 Accuracy on Blind test: 0.73 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00678349 0.00722671 0.00688362 0.00727487 0.00721765 0.00729203 0.00733399 0.00746918 0.00723958 0.00735378] mean value: 0.007207489013671875 key: score_time value: [0.00807953 0.00803757 0.00839972 0.00815082 0.00826144 0.0078733 0.00901914 0.0084033 0.00845194 0.00837016] mean value: 0.008304691314697266 key: test_mcc value: [-0.31622777 0.63245553 0.63245553 0.15811388 0.31622777 0.35 0.57735027 0.77459667 -0.25819889 0. ] mean value: 0.2866772995759719 key: train_mcc value: [0.53279352 0.50745677 0.5064147 0.45639039 0.53591229 0.42943967 0.51298918 0.46537892 0.59684919 0.41367015] mean value: 0.49572947743384127 key: test_accuracy value: [0.33333333 0.77777778 0.77777778 0.55555556 0.66666667 0.66666667 0.75 0.875 0.375 0.5 ] mean value: 0.6277777777777778 key: train_accuracy value: [0.76623377 0.75324675 0.75324675 0.72727273 0.76623377 0.71428571 0.75641026 0.73076923 0.79487179 0.70512821] mean value: 0.7467698967698968 key: test_fscore value: [0.4 0.8 0.8 0.5 0.72727273 0.66666667 0.66666667 0.85714286 0.44444444 0.5 ] mean value: 0.6362193362193362 key: train_fscore value: [0.775 0.7654321 0.75949367 0.73417722 0.775 0.71794872 0.75949367 0.74698795 0.80952381 0.72289157] mean value: 0.7565948701272275 key: test_precision value: [0.33333333 0.66666667 0.66666667 0.66666667 0.66666667 0.75 1. 1. 0.4 0.5 ] mean value: 0.665 key: train_precision value: [0.75609756 0.73809524 0.75 0.70731707 0.73809524 0.7 0.75 0.70454545 0.75555556 0.68181818] mean value: 0.7281524302256009 key: test_recall value: [0.5 1. 1. 0.4 0.8 0.6 0.5 0.75 0.5 0.5 ] mean value: 0.655 key: train_recall value: [0.79487179 0.79487179 0.76923077 0.76315789 0.81578947 0.73684211 0.76923077 0.79487179 0.87179487 0.76923077] mean value: 0.7879892037786774 key: test_roc_auc value: [0.35 0.8 0.8 0.575 0.65 0.675 0.75 0.875 0.375 0.5 ] mean value: 0.635 key: train_roc_auc value: [0.76585695 0.75269906 0.75303644 0.72773279 0.7668691 0.7145749 0.75641026 0.73076923 0.79487179 0.70512821] mean value: 0.7467948717948718 key: test_jcc value: [0.25 0.66666667 0.66666667 0.33333333 0.57142857 0.5 0.5 0.75 0.28571429 0.33333333] mean value: 0.4857142857142857 key: train_jcc value: [0.63265306 0.62 0.6122449 0.58 0.63265306 0.56 0.6122449 0.59615385 0.68 0.56603774] mean value: 0.609198750037025 MCC on Blind test: 0.08 Accuracy on Blind test: 0.5 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00730515 0.00695133 0.00699258 0.00709629 0.00697374 0.00705719 0.0069747 0.00722718 0.00697184 0.00727534] mean value: 0.007082533836364746 key: score_time value: [0.00949669 0.00922227 0.00921607 0.00929904 0.00929213 0.00919104 0.00917578 0.00927401 0.00928402 0.00928831] mean value: 0.009273934364318847 key: test_mcc value: [-0.15811388 0.15811388 0.8 0.63245553 0.31622777 0.8 0.5 0.25819889 0.25819889 0.57735027] mean value: 0.4142431346734462 key: train_mcc value: [0.58541539 0.66239043 0.61039852 0.61039852 0.55870445 0.61066127 0.64187021 0.64102564 0.62050523 0.56577895] mean value: 0.6107148606822335 key: test_accuracy value: [0.44444444 0.55555556 0.88888889 0.77777778 0.66666667 0.88888889 0.75 0.625 0.625 0.75 ] mean value: 0.6972222222222222 key: train_accuracy value: [0.79220779 0.83116883 0.80519481 0.80519481 0.77922078 0.80519481 0.82051282 0.82051282 0.80769231 0.78205128] mean value: 0.804895104895105 key: test_fscore value: [0.28571429 0.6 0.88888889 0.75 0.72727273 0.88888889 0.75 0.57142857 0.57142857 0.66666667] mean value: 0.6700288600288601 key: train_fscore value: [0.78947368 0.83544304 0.81012658 0.8 0.77922078 0.80519481 0.825 0.82051282 0.81927711 0.79012346] mean value: 0.8074372274615954 key: test_precision value: [0.33333333 0.5 0.8 1. 0.66666667 1. 0.75 0.66666667 0.66666667 1. ] mean value: 0.7383333333333333 key: train_precision value: [0.81081081 0.825 0.8 0.81081081 0.76923077 0.79487179 0.80487805 0.82051282 0.77272727 0.76190476] mean value: 0.7970747089649529 key: test_recall value: [0.25 0.75 1. 0.6 0.8 0.8 0.75 0.5 0.5 0.5 ] mean value: 0.645 key: train_recall value: [0.76923077 0.84615385 0.82051282 0.78947368 0.78947368 0.81578947 0.84615385 0.82051282 0.87179487 0.82051282] mean value: 0.8189608636977058 key: test_roc_auc value: [0.425 0.575 0.9 0.8 0.65 0.9 0.75 0.625 0.625 0.75 ] mean value: 0.7 key: train_roc_auc value: [0.79251012 0.83097166 0.80499325 0.80499325 0.77935223 0.80533063 0.82051282 0.82051282 0.80769231 0.78205128] mean value: 0.8048920377867747 key: test_jcc value: [0.16666667 0.42857143 0.8 0.6 0.57142857 0.8 0.6 0.4 0.4 0.5 ] mean value: 0.5266666666666666 key: train_jcc value: [0.65217391 0.7173913 0.68085106 0.66666667 0.63829787 0.67391304 0.70212766 0.69565217 0.69387755 0.65306122] mean value: 0.6774012472704161 MCC on Blind test: 0.06 Accuracy on Blind test: 0.65 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.00880122 0.00814819 0.00756073 0.00777268 0.00736904 0.00765371 0.00792193 0.00778627 0.00790763 0.00790238] mean value: 0.007882380485534668 key: score_time value: [0.0086937 0.00912237 0.00853276 0.00825572 0.00864816 0.00843048 0.0086689 0.00874853 0.00819087 0.00858378] mean value: 0.008587527275085449 key: test_mcc value: [0.35 0.1 0.8 1. 0.5976143 0.8 0.77459667 1. 0. 0.77459667] mean value: 0.6196807643150164 key: train_mcc value: [0.84516739 0.82485566 0.84852502 0.848923 0.79675455 0.87044534 0.74456944 0.8720816 0.77563153 0.84726867] mean value: 0.8274222215949533 key: test_accuracy value: [0.66666667 0.55555556 0.88888889 1. 0.77777778 0.88888889 0.875 1. 0.5 0.875 ] mean value: 0.8027777777777778 key: train_accuracy value: [0.92207792 0.90909091 0.92207792 0.92207792 0.8961039 0.93506494 0.87179487 0.93589744 0.88461538 0.92307692] mean value: 0.9121878121878122 key: test_fscore value: [0.66666667 0.5 0.88888889 1. 0.83333333 0.88888889 0.85714286 1. 0.5 0.85714286] mean value: 0.7992063492063491 key: train_fscore value: [0.925 0.91566265 0.92682927 0.925 0.9 0.93506494 0.875 0.93670886 0.89156627 0.925 ] mean value: 0.9155831979779763 key: test_precision value: [0.6 0.5 0.8 1. 0.71428571 1. 1. 1. 0.5 1. ] mean value: 0.8114285714285714 key: train_precision value: [0.90243902 0.86363636 0.88372093 0.88095238 0.85714286 0.92307692 0.85365854 0.925 0.84090909 0.90243902] mean value: 0.8832975131316028 key: test_recall value: [0.75 0.5 1. 1. 1. 0.8 0.75 1. 0.5 0.75] mean value: 0.805 key: train_recall value: [0.94871795 0.97435897 0.97435897 0.97368421 0.94736842 0.94736842 0.8974359 0.94871795 0.94871795 0.94871795] mean value: 0.950944669365722 key: test_roc_auc value: [0.675 0.55 0.9 1. 0.75 0.9 0.875 1. 0.5 0.875] mean value: 0.8025 key: train_roc_auc value: [0.9217274 0.90823212 0.92139001 0.92273954 0.89676113 0.93522267 0.87179487 0.93589744 0.88461538 0.92307692] mean value: 0.9121457489878543 key: test_jcc value: [0.5 0.33333333 0.8 1. 0.71428571 0.8 0.75 1. 0.33333333 0.75 ] mean value: 0.6980952380952381 key: train_jcc value: [0.86046512 0.84444444 0.86363636 0.86046512 0.81818182 0.87804878 0.77777778 0.88095238 0.80434783 0.86046512] mean value: 0.8448784740404756 MCC on Blind test: 0.09 Accuracy on Blind test: 0.52 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.32428074 0.29084134 0.39761615 0.38947153 0.38101339 0.47083735 0.4072361 0.57479668 0.46769285 0.39441442] mean value: 0.40982005596160886 key: score_time value: [0.01101065 0.01088691 0.01111293 0.01090026 0.0111537 0.01554298 0.0109508 0.01096511 0.01096678 0.01099515] mean value: 0.011448526382446289 key: test_mcc value: [0.1 0.35 0.8 0.8 0.31622777 0.8 0.5 0.77459667 0.25819889 0.77459667] mean value: 0.5473619994246965 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.55555556 0.66666667 0.88888889 0.88888889 0.66666667 0.88888889 0.75 0.875 0.625 0.875 ] mean value: 0.7680555555555555 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.5 0.66666667 0.88888889 0.88888889 0.72727273 0.88888889 0.75 0.88888889 0.57142857 0.85714286] mean value: 0.7628066378066378 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.5 0.6 0.8 1. 0.66666667 1. 0.75 0.8 0.66666667 1. ] mean value: 0.7783333333333333 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.5 0.75 1. 0.8 0.8 0.8 0.75 1. 0.5 0.75] mean value: 0.765 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.55 0.675 0.9 0.9 0.65 0.9 0.75 0.875 0.625 0.875] mean value: 0.77 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.33333333 0.5 0.8 0.8 0.57142857 0.8 0.6 0.8 0.4 0.75 ] mean value: 0.6354761904761905 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.09 Accuracy on Blind test: 0.54 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.00931001 0.00951576 0.00988293 0.00786901 0.00728393 0.01152086 0.00701046 0.00675249 0.00750613 0.01119971] mean value: 0.008785128593444824 key: score_time value: [0.01047301 0.01033044 0.00877452 0.00874352 0.00872278 0.01278138 0.00794363 0.00788903 0.00790906 0.0122633 ] mean value: 0.009583067893981934 key: test_mcc value: [0.63245553 1. 1. 1. 0.63245553 1. 1. 1. 0.77459667 1. ] mean value: 0.9039507733308835 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.77777778 1. 1. 1. 0.77777778 1. 1. 1. 0.875 1. ] mean value: 0.9430555555555555 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8 1. 1. 1. 0.75 1. 1. 1. 0.85714286 1. ] mean value: 0.9407142857142857 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.66666667 1. 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9666666666666667 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.6 1. 1. 1. 0.75 1. ] mean value: 0.935 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8 1. 1. 1. 0.8 1. 1. 1. 0.875 1. ] mean value: 0.9475 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.66666667 1. 1. 1. 0.6 1. 1. 1. 0.75 1. ] mean value: 0.9016666666666666 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.83 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.07991552 0.07531953 0.07922387 0.07759547 0.07554555 0.07641506 0.08248544 0.07638526 0.07602501 0.07815456] mean value: 0.07770652770996093 key: score_time value: [0.01661062 0.01769233 0.01668501 0.01735091 0.01669407 0.01689482 0.01721072 0.01734948 0.0171845 0.01668453] mean value: 0.017035698890686034 key: test_mcc value: [0.55 0.35 0.8 0.8 0.79056942 0.8 1. 0.77459667 0.25819889 1. ] mean value: 0.7123364974030739 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.77777778 0.66666667 0.88888889 0.88888889 0.88888889 0.88888889 1. 0.875 0.625 1. ] mean value: 0.85 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.75 0.66666667 0.88888889 0.88888889 0.90909091 0.88888889 1. 0.88888889 0.57142857 1. ] mean value: 0.8452741702741703 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.75 0.6 0.8 1. 0.83333333 1. 1. 0.8 0.66666667 1. ] mean value: 0.845 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 0.75 1. 0.8 1. 0.8 1. 1. 0.5 1. ] mean value: 0.86 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.775 0.675 0.9 0.9 0.875 0.9 1. 0.875 0.625 1. ] mean value: 0.8525 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.6 0.5 0.8 0.8 0.83333333 0.8 1. 0.8 0.4 1. ] mean value: 0.7533333333333334 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.05 Accuracy on Blind test: 0.63 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00761938 0.00663662 0.00656319 0.00661063 0.00666404 0.00655985 0.006706 0.00655651 0.00683618 0.00663257] mean value: 0.006738495826721191 key: score_time value: [0.00826526 0.00768614 0.0077951 0.00779033 0.00775886 0.00782132 0.00778651 0.00774288 0.00777555 0.00771451] mean value: 0.007813644409179688 key: test_mcc value: [ 0.35 0.1 -0.15811388 0.1 -0.1 -0.5976143 0.25819889 0. 0. 0.25819889] mean value: 0.021066959181870643 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.66666667 0.55555556 0.44444444 0.55555556 0.44444444 0.22222222 0.625 0.5 0.5 0.625 ] mean value: 0.5138888888888888 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.66666667 0.5 0.28571429 0.6 0.44444444 0. 0.57142857 0.5 0.5 0.57142857] mean value: 0.463968253968254 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.6 0.5 0.33333333 0.6 0.5 0. 0.66666667 0.5 0.5 0.66666667] mean value: 0.48666666666666664 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 0.5 0.25 0.6 0.4 0. 0.5 0.5 0.5 0.5 ] mean value: 0.45 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.675 0.55 0.425 0.55 0.45 0.25 0.625 0.5 0.5 0.625] mean value: 0.515 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.5 0.33333333 0.16666667 0.42857143 0.28571429 0. 0.4 0.33333333 0.33333333 0.4 ] mean value: 0.3180952380952381 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.09 Accuracy on Blind test: 0.52 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [0.95179796 0.94567418 0.94950175 0.96822572 0.93672466 0.96356082 0.97707057 1.03818297 1.03070283 1.00553799] mean value: 0.9766979455947876 key: score_time value: [0.09188795 0.09431767 0.08775377 0.08800793 0.09016871 0.08714199 0.09610558 0.09580159 0.09604168 0.09132028] mean value: 0.09185471534729003 key: test_mcc value: [0.8 0.55 0.8 1. 0.55 1. 1. 0.77459667 0.77459667 1. ] mean value: 0.8249193338482967 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.88888889 0.77777778 0.88888889 1. 0.77777778 1. 1. 0.875 0.875 1. ] mean value: 0.9083333333333333 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.88888889 0.75 0.88888889 1. 0.8 1. 1. 0.88888889 0.85714286 1. ] mean value: 0.9073809523809524 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8 0.75 0.8 1. 0.8 1. 1. 0.8 1. 1. ] mean value: 0.895 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.75 1. 1. 0.8 1. 1. 1. 0.75 1. ] mean value: 0.93 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9 0.775 0.9 1. 0.775 1. 1. 0.875 0.875 1. ] mean value: 0.91 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.8 0.6 0.8 1. 0.66666667 1. 1. 0.8 0.75 1. ] mean value: 0.8416666666666667 key: train_jcc value: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.75 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.92752123 0.89317346 0.80733633 0.90238857 0.82283401 0.9298923 0.91012096 0.82151413 0.85942483 0.84982991] mean value: 0.8724035739898681 key: score_time value: [0.19612741 0.17702603 0.17312717 0.23580909 0.18489385 0.20000648 0.20895576 0.13845778 0.27115655 0.17170072] mean value: 0.19572608470916747 key: test_mcc value: [0.35 0.55 0.8 1. 0.55 1. 1. 0.5 0.77459667 1. ] mean value: 0.7524596669241483 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 0.97467943 1. ] mean value: 0.9974679434480896 key: test_accuracy value: [0.66666667 0.77777778 0.88888889 1. 0.77777778 1. 1. 0.75 0.875 1. ] mean value: 0.8736111111111111 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 0.98717949 1. ] mean value: 0.9987179487179487 key: test_fscore value: [0.66666667 0.75 0.88888889 1. 0.8 1. 1. 0.75 0.85714286 1. ] mean value: 0.8712698412698413 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 0.98701299 1. ] mean value: 0.9987012987012986 key: test_precision value: [0.6 0.75 0.8 1. 0.8 1. 1. 0.75 1. 1. ] mean value: 0.87 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.75 0.75 1. 1. 0.8 1. 1. 0.75 0.75 1. ] mean value: 0.88 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 0.97435897 1. ] mean value: 0.9974358974358974 key: test_roc_auc value: [0.675 0.775 0.9 1. 0.775 1. 1. 0.75 0.875 1. ] mean value: 0.875 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 0.98717949 1. ] mean value: 0.9987179487179487 key: test_jcc value: [0.5 0.6 0.8 1. 0.66666667 1. 1. 0.6 0.75 1. ] mean value: 0.7916666666666666 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 0.97435897 1. ] mean value: 0.9974358974358974 MCC on Blind test: 0.14 Accuracy on Blind test: 0.74 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01836395 0.00678182 0.00684881 0.00723457 0.00742078 0.00704598 0.00751448 0.00682497 0.00756073 0.0067997 ] mean value: 0.00823957920074463 key: score_time value: [0.01078892 0.00827122 0.00816345 0.00804162 0.00847411 0.00859022 0.00832844 0.00801682 0.00818849 0.00833344] mean value: 0.008519673347473144 key: test_mcc value: [-0.31622777 0.63245553 0.63245553 0.15811388 0.31622777 0.35 0.57735027 0.77459667 -0.25819889 0. ] mean value: 0.2866772995759719 key: train_mcc value: [0.53279352 0.50745677 0.5064147 0.45639039 0.53591229 0.42943967 0.51298918 0.46537892 0.59684919 0.41367015] mean value: 0.49572947743384127 key: test_accuracy value: [0.33333333 0.77777778 0.77777778 0.55555556 0.66666667 0.66666667 0.75 0.875 0.375 0.5 ] mean value: 0.6277777777777778 key: train_accuracy value: [0.76623377 0.75324675 0.75324675 0.72727273 0.76623377 0.71428571 0.75641026 0.73076923 0.79487179 0.70512821] mean value: 0.7467698967698968 key: test_fscore value: [0.4 0.8 0.8 0.5 0.72727273 0.66666667 0.66666667 0.85714286 0.44444444 0.5 ] mean value: 0.6362193362193362 key: train_fscore value: [0.775 0.7654321 0.75949367 0.73417722 0.775 0.71794872 0.75949367 0.74698795 0.80952381 0.72289157] mean value: 0.7565948701272275 key: test_precision value: [0.33333333 0.66666667 0.66666667 0.66666667 0.66666667 0.75 1. 1. 0.4 0.5 ] mean value: 0.665 key: train_precision value: [0.75609756 0.73809524 0.75 0.70731707 0.73809524 0.7 0.75 0.70454545 0.75555556 0.68181818] mean value: 0.7281524302256009 key: test_recall value: [0.5 1. 1. 0.4 0.8 0.6 0.5 0.75 0.5 0.5 ] mean value: 0.655 key: train_recall value: [0.79487179 0.79487179 0.76923077 0.76315789 0.81578947 0.73684211 0.76923077 0.79487179 0.87179487 0.76923077] mean value: 0.7879892037786774 key: test_roc_auc value: [0.35 0.8 0.8 0.575 0.65 0.675 0.75 0.875 0.375 0.5 ] mean value: 0.635 key: train_roc_auc value: [0.76585695 0.75269906 0.75303644 0.72773279 0.7668691 0.7145749 0.75641026 0.73076923 0.79487179 0.70512821] mean value: 0.7467948717948718 key: test_jcc value: [0.25 0.66666667 0.66666667 0.33333333 0.57142857 0.5 0.5 0.75 0.28571429 0.33333333] mean value: 0.4857142857142857 key: train_jcc value: [0.63265306 0.62 0.6122449 0.58 0.63265306 0.56 0.6122449 0.59615385 0.68 0.56603774] mean value: 0.609198750037025 MCC on Blind test: 0.08 Accuracy on Blind test: 0.5 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.0487535 0.02997184 0.05316138 0.02903056 0.02904987 0.0322907 0.18433547 0.02987671 0.02673578 0.02919006] mean value: 0.049239587783813474 key: score_time value: [0.01463509 0.00997066 0.00984716 0.00954914 0.00983858 0.01014209 0.01031256 0.01060939 0.01125598 0.00966144] mean value: 0.010582208633422852 key: test_mcc value: [0.8 1. 1. 1. 0.8 1. 1. 0.77459667 0.57735027 1. ] mean value: 0.8951946938431109 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.88888889 1. 1. 1. 0.88888889 1. 1. 0.875 0.75 1. ] mean value: 0.9402777777777778 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.88888889 1. 1. 1. 0.88888889 1. 1. 0.88888889 0.66666667 1. ] mean value: 0.9333333333333333 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8 1. 1. 1. 1. 1. 1. 0.8 1. 1. ] mean value: 0.96 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.8 1. 1. 1. 0.5 1. ] mean value: 0.93 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9 1. 1. 1. 0.9 1. 1. 0.875 0.75 1. ] mean value: 0.9425 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.8 1. 1. 1. 0.8 1. 1. 0.8 0.5 1. ] mean value: 0.89 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.09 Accuracy on Blind test: 0.77 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.00983548 0.01052451 0.0106616 0.01091266 0.010957 0.01091719 0.01095319 0.01184487 0.01091933 0.01090574] mean value: 0.010843157768249512 key: score_time value: [0.0105567 0.01009798 0.01041722 0.01042056 0.01047301 0.01049972 0.01047277 0.01058149 0.01053119 0.01043415] mean value: 0.010448479652404785 key: test_mcc value: [0.8 0.8 0.8 0.79056942 1. 0.55 1. 0.77459667 0.25819889 0.57735027] mean value: 0.7350715243220365 key: train_mcc value: [1. 1. 0.97434188 0.97435897 1. 0.97435897 0.94996791 1. 1. 1. ] mean value: 0.987302773890115 key: test_accuracy value: [0.88888889 0.88888889 0.88888889 0.88888889 1. 0.77777778 1. 0.875 0.625 0.75 ] mean value: 0.8583333333333333 key: train_accuracy value: [1. 1. 0.98701299 0.98701299 1. 0.98701299 0.97435897 1. 1. 1. ] mean value: 0.9935397935397935 key: test_fscore value: [0.88888889 0.88888889 0.88888889 0.90909091 1. 0.8 1. 0.88888889 0.57142857 0.66666667] mean value: 0.8502741702741703 key: train_fscore value: [1. 1. 0.98734177 0.98701299 1. 0.98701299 0.975 1. 1. 1. ] mean value: 0.9936367746177872 key: test_precision value: [0.8 0.8 0.8 0.83333333 1. 0.8 1. 0.8 0.66666667 1. ] mean value: 0.85 key: train_precision value: [1. 1. 0.975 0.97435897 1. 0.97435897 0.95121951 1. 1. 1. ] mean value: 0.987493746091307 key: test_recall value: [1. 1. 1. 1. 1. 0.8 1. 1. 0.5 0.5] mean value: 0.88 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9 0.9 0.9 0.875 1. 0.775 1. 0.875 0.625 0.75 ] mean value: 0.86 key: train_roc_auc value: [1. 1. 0.98684211 0.98717949 1. 0.98717949 0.97435897 1. 1. 1. ] mean value: 0.9935560053981106 key: test_jcc value: [0.8 0.8 0.8 0.83333333 1. 0.66666667 1. 0.8 0.4 0.5 ] mean value: 0.76 key: train_jcc value: [1. 1. 0.975 0.97435897 1. 0.97435897 0.95121951 1. 1. 1. ] mean value: 0.987493746091307 MCC on Blind test: 0.06 Accuracy on Blind test: 0.67 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.00919414 0.00788784 0.00760937 0.00748897 0.00734115 0.00743103 0.00754428 0.00746417 0.00744605 0.00730228] mean value: 0.007670927047729492 key: score_time value: [0.01072121 0.0088973 0.00895858 0.00848174 0.00847363 0.00847983 0.00860381 0.00846124 0.00856113 0.00795674] mean value: 0.008759522438049316 key: test_mcc value: [0.55 0.1 0.8 0.8 0.31622777 0.55 0.57735027 0.57735027 0. 0.57735027] mean value: 0.4848278573585716 key: train_mcc value: [0.61257733 0.66463964 0.6374073 0.55962522 0.63928106 0.58485583 0.64102564 0.56577895 0.66688593 0.56428809] mean value: 0.6136364988556005 key: test_accuracy value: [0.77777778 0.55555556 0.88888889 0.88888889 0.66666667 0.77777778 0.75 0.75 0.5 0.75 ] mean value: 0.7305555555555555 key: train_accuracy value: [0.80519481 0.83116883 0.81818182 0.77922078 0.81818182 0.79220779 0.82051282 0.78205128 0.83333333 0.78205128] mean value: 0.8062104562104563 key: test_fscore value: [0.75 0.5 0.88888889 0.88888889 0.72727273 0.8 0.66666667 0.66666667 0.5 0.66666667] mean value: 0.7055050505050505 key: train_fscore value: [0.8 0.82666667 0.81578947 0.76712329 0.80555556 0.78378378 0.82051282 0.77333333 0.83544304 0.78481013] mean value: 0.8013018085764565 key: test_precision value: [0.75 0.5 0.8 1. 0.66666667 0.8 1. 1. 0.5 1. ] mean value: 0.8016666666666666 key: train_precision value: [0.83333333 0.86111111 0.83783784 0.8 0.85294118 0.80555556 0.82051282 0.80555556 0.825 0.775 ] mean value: 0.8216847390376802 key: test_recall value: [0.75 0.5 1. 0.8 0.8 0.8 0.5 0.5 0.5 0.5 ] mean value: 0.665 key: train_recall value: [0.76923077 0.79487179 0.79487179 0.73684211 0.76315789 0.76315789 0.82051282 0.74358974 0.84615385 0.79487179] mean value: 0.7827260458839406 key: test_roc_auc value: [0.775 0.55 0.9 0.9 0.65 0.775 0.75 0.75 0.5 0.75 ] mean value: 0.73 key: train_roc_auc value: [0.80566802 0.83164642 0.81848853 0.77867746 0.81747638 0.79183536 0.82051282 0.78205128 0.83333333 0.78205128] mean value: 0.8061740890688258 key: test_jcc value: [0.6 0.33333333 0.8 0.8 0.57142857 0.66666667 0.5 0.5 0.33333333 0.5 ] mean value: 0.5604761904761905 key: train_jcc value: [0.66666667 0.70454545 0.68888889 0.62222222 0.6744186 0.64444444 0.69565217 0.63043478 0.7173913 0.64583333] mean value: 0.6690497875621738 MCC on Blind test: 0.09 Accuracy on Blind test: 0.56 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00780702 0.00762272 0.00795984 0.00789499 0.00773883 0.00785589 0.00779939 0.00809336 0.00800991 0.00795817] mean value: 0.007874011993408203 key: score_time value: [0.00883174 0.00864387 0.00875735 0.00865006 0.00847197 0.00864172 0.00876093 0.00859165 0.00871468 0.00850964] mean value: 0.008657360076904297 key: test_mcc value: [0.31622777 0.15811388 0.8 0.79056942 1. 1. 0.77459667 0.5 0.5 1. ] mean value: 0.6839507733308835 key: train_mcc value: [1. 0.92480439 0.94935876 1. 0.94804318 0.94935876 0.87904907 0.85634884 0.97467943 0.90219371] mean value: 0.93838361447064 key: test_accuracy value: [0.66666667 0.55555556 0.88888889 0.88888889 1. 1. 0.875 0.75 0.75 1. ] mean value: 0.8375 key: train_accuracy value: [1. 0.96103896 0.97402597 1. 0.97402597 0.97402597 0.93589744 0.92307692 0.98717949 0.94871795] mean value: 0.9677988677988678 key: test_fscore value: [0.57142857 0.6 0.88888889 0.90909091 1. 1. 0.88888889 0.75 0.75 1. ] mean value: 0.8358297258297258 key: train_fscore value: [1. 0.96296296 0.97368421 1. 0.97368421 0.97435897 0.93975904 0.92857143 0.98734177 0.95121951] mean value: 0.9691582107437596 key: test_precision value: [0.66666667 0.5 0.8 0.83333333 1. 1. 0.8 0.75 0.75 1. ] mean value: 0.81 key: train_precision value: [1. 0.92857143 1. 1. 0.97368421 0.95 0.88636364 0.86666667 0.975 0.90697674] mean value: 0.9487262686314094 key: test_recall value: [0.5 0.75 1. 1. 1. 1. 1. 0.75 0.75 1. ] mean value: 0.875 key: train_recall value: [1. 1. 0.94871795 1. 0.97368421 1. 1. 1. 1. 1. ] mean value: 0.9922402159244265 key: test_roc_auc value: [0.65 0.575 0.9 0.875 1. 1. 0.875 0.75 0.75 1. ] mean value: 0.8375 key: train_roc_auc value: [1. 0.96052632 0.97435897 1. 0.97402159 0.97435897 0.93589744 0.92307692 0.98717949 0.94871795] mean value: 0.9678137651821862 key: test_jcc value: [0.4 0.42857143 0.8 0.83333333 1. 1. 0.8 0.6 0.6 1. ] mean value: 0.7461904761904762 key: train_jcc value: [1. 0.92857143 0.94871795 1. 0.94871795 0.95 0.88636364 0.86666667 0.975 0.90697674] mean value: 0.9411014373223675 MCC on Blind test: 0.09 Accuracy on Blind test: 0.53 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00939584 0.00922012 0.00787425 0.00773025 0.00760698 0.00704384 0.00760889 0.00750351 0.00746274 0.00742221] mean value: 0.007886862754821778 key: score_time value: [0.01045752 0.01001 0.00872231 0.0086472 0.00853968 0.00859547 0.0085783 0.00867987 0.00851488 0.008075 ] mean value: 0.0088820219039917 key: test_mcc value: [0.5976143 0.55 0.8 0.79056942 0.63245553 1. 0.77459667 0.5 0. 1. ] mean value: 0.6645235920984451 key: train_mcc value: [0.90109146 1. 0.97435897 0.90109146 0.70243936 0.75611265 0.94996791 1. 0.46770717 0.9258201 ] mean value: 0.8578589074717703 key: test_accuracy value: [0.77777778 0.77777778 0.88888889 0.88888889 0.77777778 1. 0.875 0.75 0.5 1. ] mean value: 0.8236111111111111 key: train_accuracy value: [0.94805195 1. 0.98701299 0.94805195 0.83116883 0.87012987 0.97435897 1. 0.67948718 0.96153846] mean value: 0.91998001998002 key: test_fscore value: [0.66666667 0.75 0.88888889 0.90909091 0.75 1. 0.88888889 0.75 0.6 1. ] mean value: 0.8203535353535354 key: train_fscore value: [0.94594595 1. 0.98701299 0.95 0.79365079 0.85294118 0.975 1. 0.75728155 0.96296296] mean value: 0.9224795419441336 key: test_precision value: [1. 0.75 0.8 0.83333333 1. 1. 0.8 0.75 0.5 1. ] mean value: 0.8433333333333334 key: train_precision value: [1. 1. 1. 0.9047619 1. 0.96666667 0.95121951 1. 0.609375 0.92857143] mean value: 0.9360594512195122 key: test_recall value: [0.5 0.75 1. 1. 0.6 1. 1. 0.75 0.75 1. ] mean value: 0.835 key: train_recall value: [0.8974359 1. 0.97435897 1. 0.65789474 0.76315789 1. 1. 1. 1. ] mean value: 0.9292847503373819 key: test_roc_auc value: [0.75 0.775 0.9 0.875 0.8 1. 0.875 0.75 0.5 1. ] mean value: 0.8225 key: train_roc_auc value: [0.94871795 1. 0.98717949 0.94871795 0.82894737 0.86875843 0.97435897 1. 0.67948718 0.96153846] mean value: 0.9197705802968961 key: test_jcc value: [0.5 0.6 0.8 0.83333333 0.6 1. 0.8 0.6 0.42857143 1. ] mean value: 0.7161904761904762 key: train_jcc value: [0.8974359 1. 0.97435897 0.9047619 0.65789474 0.74358974 0.95121951 1. 0.609375 0.92857143] mean value: 0.8667207197755176 MCC on Blind test: 0.11 Accuracy on Blind test: 0.65 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.07140326 0.05798793 0.05990863 0.05765224 0.0598948 0.06153941 0.05800366 0.05805969 0.05758524 0.05807018] mean value: 0.06001050472259521 key: score_time value: [0.0154562 0.01460052 0.01546645 0.01415467 0.01506758 0.01572824 0.01406693 0.01429629 0.01435161 0.01549411] mean value: 0.01486825942993164 key: test_mcc value: [0.8 1. 1. 1. 0.63245553 1. 1. 1. 0.77459667 1. ] mean value: 0.9207052201275159 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.88888889 1. 1. 1. 0.77777778 1. 1. 1. 0.875 1. ] mean value: 0.9541666666666666 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.88888889 1. 1. 1. 0.75 1. 1. 1. 0.88888889 1. ] mean value: 0.9527777777777777 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8 1. 1. 1. 1. 1. 1. 1. 0.8 1. ] mean value: 0.96 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.6 1. 1. 1. 1. 1. ] mean value: 0.96 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9 1. 1. 1. 0.8 1. 1. 1. 0.875 1. ] mean value: 0.9575 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.8 1. 1. 1. 0.6 1. 1. 1. 0.8 1. ] mean value: 0.92 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.1 Accuracy on Blind test: 0.81 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.02970028 0.02384472 0.02287769 0.03229713 0.02531958 0.02179098 0.02337933 0.03095937 0.02274895 0.02188182] mean value: 0.025479984283447266 key: score_time value: [0.0158186 0.0168395 0.01607776 0.02253652 0.02111721 0.01882815 0.02102876 0.02306414 0.01536655 0.0155549 ] mean value: 0.018623208999633788 key: test_mcc value: [0.8 1. 1. 1. 0.63245553 1. 1. 0.77459667 0.57735027 1. ] mean value: 0.8784402470464785 key: train_mcc value: [1. 1. 0.97435897 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9974358974358974 key: test_accuracy value: [0.88888889 1. 1. 1. 0.77777778 1. 1. 0.875 0.75 1. ] mean value: 0.9291666666666667 key: train_accuracy value: [1. 1. 0.98701299 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9987012987012986 key: test_fscore value: [0.88888889 1. 1. 1. 0.75 1. 1. 0.88888889 0.66666667 1. ] mean value: 0.9194444444444444 key: train_fscore value: [1. 1. 0.98701299 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9987012987012986 key: test_precision value: [0.8 1. 1. 1. 1. 1. 1. 0.8 1. 1. ] mean value: 0.96 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.6 1. 1. 1. 0.5 1. ] mean value: 0.91 key: train_recall value: [1. 1. 0.97435897 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9974358974358974 key: test_roc_auc value: [0.9 1. 1. 1. 0.8 1. 1. 0.875 0.75 1. ] mean value: 0.9325 key: train_roc_auc value: [1. 1. 0.98717949 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9987179487179487 key: test_jcc value: [0.8 1. 1. 1. 0.6 1. 1. 0.8 0.5 1. ] mean value: 0.87 key: train_jcc value: [1. 1. 0.97435897 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9974358974358974 MCC on Blind test: 0.12 Accuracy on Blind test: 0.84 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.01199746 0.01234484 0.01242924 0.01289916 0.01300168 0.01250005 0.01238441 0.01239324 0.01242089 0.01329017] mean value: 0.012566113471984863 key: score_time value: [0.0101943 0.01012278 0.01048732 0.01058817 0.0106461 0.01056409 0.01057553 0.01059437 0.01054907 0.01066661] mean value: 0.010498833656311036 key: test_mcc value: [0.55 0.35 0.8 0.79056942 0.79056942 0.8 0.77459667 0.5 0.25819889 0.77459667] mean value: 0.6388531058314317 key: train_mcc value: [1. 1. 0.97434188 1. 1. 0.97435897 0.97467943 1. 0.97467943 1. ] mean value: 0.9898059726472239 key: test_accuracy value: [0.77777778 0.66666667 0.88888889 0.88888889 0.88888889 0.88888889 0.875 0.75 0.625 0.875 ] mean value: 0.8125 key: train_accuracy value: [1. 1. 0.98701299 1. 1. 0.98701299 0.98717949 1. 0.98717949 1. ] mean value: 0.9948384948384948 key: test_fscore value: [0.75 0.66666667 0.88888889 0.90909091 0.90909091 0.88888889 0.85714286 0.75 0.57142857 0.85714286] mean value: 0.8048340548340548 key: train_fscore value: [1. 1. 0.98734177 1. 1. 0.98701299 0.98734177 1. 0.98701299 1. ] mean value: 0.9948709518329771 key: test_precision value: [0.75 0.6 0.8 0.83333333 0.83333333 1. 1. 0.75 0.66666667 1. ] mean value: 0.8233333333333334 key: train_precision value: [1. 1. 0.975 1. 1. 0.97435897 0.975 1. 1. 1. ] mean value: 0.9924358974358974 key: test_recall value: [0.75 0.75 1. 1. 1. 0.8 0.75 0.75 0.5 0.75] mean value: 0.805 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 0.97435897 1. ] mean value: 0.9974358974358974 key: test_roc_auc value: [0.775 0.675 0.9 0.875 0.875 0.9 0.875 0.75 0.625 0.875] mean value: 0.8125 key: train_roc_auc value: [1. 1. 0.98684211 1. 1. 0.98717949 0.98717949 1. 0.98717949 1. ] mean value: 0.9948380566801619 key: test_jcc value: [0.6 0.5 0.8 0.83333333 0.83333333 0.8 0.75 0.6 0.4 0.75 ] mean value: 0.6866666666666666 key: train_jcc value: [1. 1. 0.975 1. 1. 0.97435897 0.975 1. 0.97435897 1. ] mean value: 0.9898717948717949 MCC on Blind test: 0.1 Accuracy on Blind test: 0.59 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.0684216 0.06198835 0.06150103 0.06125021 0.05035377 0.06086302 0.06419754 0.05684948 0.05581045 0.06373596] mean value: 0.06049714088439941 key: score_time value: [0.00865507 0.00866818 0.00822926 0.008461 0.00909662 0.00912499 0.00889039 0.00892878 0.0091598 0.00874305] mean value: 0.008795714378356934 key: test_mcc value: [0.63245553 1. 1. 1. 0.63245553 1. 1. 1. 0.77459667 1. ] mean value: 0.9039507733308835 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.77777778 1. 1. 1. 0.77777778 1. 1. 1. 0.875 1. ] mean value: 0.9430555555555555 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.8 1. 1. 1. 0.75 1. 1. 1. 0.85714286 1. ] mean value: 0.9407142857142857 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.66666667 1. 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9666666666666667 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 0.6 1. 1. 1. 0.75 1. ] mean value: 0.935 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.8 1. 1. 1. 0.8 1. 1. 1. 0.875 1. ] mean value: 0.9475 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.66666667 1. 1. 1. 0.6 1. 1. 1. 0.75 1. ] mean value: 0.9016666666666666 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.76 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1327: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 due to no predicted samples. Use `zero_division` parameter to control this behavior. _warn_prf(average, modifier, msg_start, len(result)) /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.00811529 0.00809407 0.01050186 0.00907612 0.00753951 0.0079627 0.00725293 0.00733447 0.00738692 0.00783324] mean value: 0.008109712600708007 key: score_time value: [0.01106501 0.01026535 0.00954747 0.00802541 0.0085423 0.00829649 0.00797725 0.00805521 0.00838113 0.00803828] mean value: 0.008819389343261718 key: test_mcc value: [ 0.05976143 -0.31622777 0.31622777 0. 0.47809144 -0.05976143 0.25819889 0.57735027 0. 0. ] mean value: 0.13136406026705444 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.55555556 0.44444444 0.66666667 0.44444444 0.66666667 0.44444444 0.625 0.75 0.5 0.5 ] mean value: 0.5597222222222222 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.33333333 0. 0.57142857 0. 0.57142857 0.28571429 0.57142857 0.66666667 0.33333333 0. ] mean value: 0.33333333333333337 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.5 0. 0.66666667 0. 1. 0.5 0.66666667 1. 0.5 0. ] mean value: 0.48333333333333334 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [0.25 0. 0.5 0. 0.4 0.2 0.5 0.5 0.25 0. ] mean value: 0.26 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.525 0.4 0.65 0.5 0.7 0.475 0.625 0.75 0.5 0.5 ] mean value: 0.5625 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.2 0. 0.4 0. 0.4 0.16666667 0.4 0.5 0.2 0. ] mean value: 0.22666666666666668 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.03 Accuracy on Blind test: 0.51 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.01002216 0.0099225 0.00757122 0.00743675 0.00746179 0.00744224 0.00728893 0.00753307 0.00745034 0.00752497] mean value: 0.007965397834777833 key: score_time value: [0.01054311 0.00975561 0.008003 0.00801349 0.00793147 0.0078249 0.00790191 0.00796008 0.00788474 0.00783968] mean value: 0.008365797996520995 key: test_mcc value: [0.55 0.35 0.8 1. 1. 1. 0.77459667 0.5 0.5 1. ] mean value: 0.7474596669241483 key: train_mcc value: [0.97435897 0.94804318 0.89608637 0.92240216 0.94804318 0.92240216 0.89861829 0.94871795 1. 0.97467943] mean value: 0.9433351706022929 key: test_accuracy value: [0.77777778 0.66666667 0.88888889 1. 1. 1. 0.875 0.75 0.75 1. ] mean value: 0.8708333333333333 key: train_accuracy value: [0.98701299 0.97402597 0.94805195 0.96103896 0.97402597 0.96103896 0.94871795 0.97435897 1. 0.98717949] mean value: 0.9715451215451215 key: test_fscore value: [0.75 0.66666667 0.88888889 1. 1. 1. 0.88888889 0.75 0.75 1. ] mean value: 0.8694444444444445 key: train_fscore value: [0.98701299 0.97435897 0.94871795 0.96103896 0.97368421 0.96103896 0.95 0.97435897 1. 0.98701299] mean value: 0.971722400406611 key: test_precision value: [0.75 0.6 0.8 1. 1. 1. 0.8 0.75 0.75 1. ] mean value: 0.845 key: train_precision value: [1. 0.97435897 0.94871795 0.94871795 0.97368421 0.94871795 0.92682927 0.97435897 1. 1. ] mean value: 0.9695385273690793 key: test_recall value: [0.75 0.75 1. 1. 1. 1. 1. 0.75 0.75 1. ] mean value: 0.9 key: train_recall value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:183: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:186: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rus_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) [0.97435897 0.97435897 0.94871795 0.97368421 0.97368421 0.97368421 0.97435897 0.97435897 1. 0.97435897] mean value: 0.9741565452091768 key: test_roc_auc value: [0.775 0.675 0.9 1. 1. 1. 0.875 0.75 0.75 1. ] mean value: 0.8725 key: train_roc_auc value: [0.98717949 0.97402159 0.94804318 0.96120108 0.97402159 0.96120108 0.94871795 0.97435897 1. 0.98717949] mean value: 0.9715924426450743 key: test_jcc value: [0.6 0.5 0.8 1. 1. 1. 0.8 0.6 0.6 1. ] mean value: 0.79 key: train_jcc value: [0.97435897 0.95 0.90243902 0.925 0.94871795 0.925 0.9047619 0.95 1. 0.97435897] mean value: 0.9454636826588046 MCC on Blind test: 0.06 Accuracy on Blind test: 0.67 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.07551575 0.0629611 0.06268406 0.06178474 0.06372309 0.06233025 0.06263781 0.06301665 0.06373763 0.06232142] mean value: 0.06407124996185302 key: score_time value: [0.00872803 0.00875998 0.00883865 0.0087626 0.00883889 0.00868249 0.00857091 0.00899076 0.00884104 0.00882053] mean value: 0.008783388137817382 key: test_mcc value: [0.8 0.35 0.8 1. 1. 0.8 1. 0.77459667 0.5 0.77459667] mean value: 0.7799193338482967 key: train_mcc value: [0.94804318 0.94804318 0.94804318 0.92240216 0.94804318 0.94804318 0.94871795 1. 1. 0.94871795] mean value: 0.9560053981106613 key: test_accuracy value: [0.88888889 0.66666667 0.88888889 1. 1. 0.88888889 1. 0.875 0.75 0.875 ] mean value: 0.8833333333333333 key: train_accuracy value: [0.97402597 0.97402597 0.97402597 0.96103896 0.97402597 0.97402597 0.97435897 1. 1. 0.97435897] mean value: 0.977988677988678 key: test_fscore value: [0.88888889 0.66666667 0.88888889 1. 1. 0.88888889 1. 0.88888889 0.75 0.85714286] mean value: 0.8829365079365079 key: train_fscore value: [0.97435897 0.97435897 0.97435897 0.96103896 0.97368421 0.97368421 0.97435897 1. 1. 0.97435897] mean value: 0.9780202253886464 key: test_precision value: [0.8 0.6 0.8 1. 1. 1. 1. 0.8 0.75 1. ] mean value: 0.875 key: train_precision value: [0.97435897 0.97435897 0.97435897 0.94871795 0.97368421 0.97368421 0.97435897 1. 1. 0.97435897] mean value: 0.9767881241565453 key: test_recall value: [1. 0.75 1. 1. 1. 0.8 1. 1. 0.75 0.75] mean value: 0.905 key: train_recall value: [0.97435897 0.97435897 0.97435897 0.97368421 0.97368421 0.97368421 0.97435897 1. 1. 0.97435897] mean value: 0.9792847503373819 key: test_roc_auc value: [0.9 0.675 0.9 1. 1. 0.9 1. 0.875 0.75 0.875] mean value: 0.8875 key: train_roc_auc value: [0.97402159 0.97402159 0.97402159 0.96120108 0.97402159 0.97402159 0.97435897 1. 1. 0.97435897] mean value: 0.9780026990553307 key: test_jcc value: [0.8 0.5 0.8 1. 1. 0.8 1. 0.8 0.6 0.75] mean value: 0.805 key: train_jcc value: [0.95 0.95 0.95 0.925 0.94871795 0.94871795 0.95 1. 1. 0.95 ] mean value: 0.9572435897435897 MCC on Blind test: 0.06 Accuracy on Blind test: 0.68 Model_name: Logistic Regression Model func: LogisticRegression(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegression(random_state=42))]) key: fit_time value: [0.0179987 0.01539111 0.01446462 0.01311445 0.01305366 0.01297665 0.01468205 0.01300359 0.01401901 0.01403546] mean value: 0.014273929595947265 key: score_time value: [0.01068687 0.00844073 0.00901723 0.00842404 0.00852823 0.00848007 0.00915885 0.00892854 0.00850725 0.00879216] mean value: 0.008896398544311523 key: test_mcc value: [0.51639778 0.62994079 0.73214286 0.49099025 0.87287156 0.87287156 0.46428571 0.32732684 0.64465837 0.875 ] mean value: 0.6426485720764821 key: train_mcc value: [0.808911 0.79446135 0.78111679 0.82629176 0.83951407 0.76668815 0.81031543 0.8251228 0.81092683 0.81027501] mean value: 0.8073623185403057 key: test_accuracy value: [0.75 0.8125 0.86666667 0.73333333 0.93333333 0.93333333 0.73333333 0.66666667 0.8 0.93333333] mean value: 0.81625 key: train_accuracy value: [0.90441176 0.89705882 0.89051095 0.91240876 0.91970803 0.88321168 0.90510949 0.91240876 0.90510949 0.90510949] mean value: 0.903504723057106 key: test_fscore value: [0.77777778 0.8 0.85714286 0.75 0.92307692 0.92307692 0.75 0.70588235 0.84210526 0.93333333] mean value: 0.8262395430506886 key: train_fscore value: [0.9037037 0.89552239 0.89051095 0.91044776 0.91970803 0.88571429 0.90510949 0.91044776 0.90225564 0.9037037 ] mean value: 0.9027123709820484 key: test_precision value: [0.7 0.85714286 0.85714286 0.66666667 1. 1. 0.75 0.66666667 0.72727273 1. ] mean value: 0.8224891774891775 key: train_precision value: [0.91044776 0.90909091 0.89705882 0.93846154 0.92647059 0.87323944 0.89855072 0.92424242 0.92307692 0.91044776] mean value: 0.911108689028196 key: test_recall value: [0.875 0.75 0.85714286 0.85714286 0.85714286 0.85714286 0.75 0.75 1. 0.875 ] mean value: 0.8428571428571429 key: train_recall value: [0.89705882 0.88235294 0.88405797 0.88405797 0.91304348 0.89855072 0.91176471 0.89705882 0.88235294 0.89705882] mean value: 0.8947357203751065 key: test_roc_auc value: [0.75 0.8125 0.86607143 0.74107143 0.92857143 0.92857143 0.73214286 0.66071429 0.78571429 0.9375 ] mean value: 0.8142857142857143 key: train_roc_auc value: [0.90441176 0.89705882 0.8905584 0.91261722 0.91975703 0.88309889 0.90515772 0.91229753 0.90494459 0.90505115] mean value: 0.9034953111679455 key: test_jcc value: [0.63636364 0.66666667 0.75 0.6 0.85714286 0.85714286 0.6 0.54545455 0.72727273 0.875 ] mean value: 0.711504329004329 key: train_jcc value: [0.82432432 0.81081081 0.80263158 0.83561644 0.85135135 0.79487179 0.82666667 0.83561644 0.82191781 0.82432432] mean value: 0.8228131536228147 MCC on Blind test: 0.12 Accuracy on Blind test: 0.66 Model_name: Logistic RegressionCV Model func: LogisticRegressionCV(random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:444: ConvergenceWarning: lbfgs failed to converge (status=1): STOP: TOTAL NO. of ITERATIONS REACHED LIMIT. Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LogisticRegressionCV(random_state=42))]) key: fit_time value: [0.36799169 0.37216139 0.3838408 0.37895703 0.3886342 0.3847723 0.39137197 0.39352298 0.38147902 0.38831353] mean value: 0.3831044912338257 key: score_time value: [0.00858855 0.00922418 0.00917625 0.00932384 0.00937819 0.00943565 0.00947142 0.00946307 0.00953293 0.00940537] mean value: 0.009299945831298829 key: test_mcc value: [0.62994079 0.8819171 0.875 0.49099025 1. 0.73214286 0.6000992 0.87287156 0.64465837 0.875 ] mean value: 0.7602620132524002 key: train_mcc value: [0.85294118 1. 1. 0.88360693 1. 1. 1. 1. 0.88355744 1. ] mean value: 0.9620105545903546 key: test_accuracy value: [0.8125 0.9375 0.93333333 0.73333333 1. 0.86666667 0.8 0.93333333 0.8 0.93333333] mean value: 0.875 key: train_accuracy value: [0.92647059 1. 1. 0.94160584 1. 1. 1. 1. 0.94160584 1. ] mean value: 0.9809682267067411 key: test_fscore value: [0.82352941 0.94117647 0.93333333 0.75 1. 0.85714286 0.82352941 0.94117647 0.84210526 0.93333333] mean value: 0.8845326551673302 key: train_fscore value: [0.92647059 1. 1. 0.94117647 1. 1. 1. 1. 0.94029851 1. ] mean value: 0.9807945566286216 key: test_precision value: [0.77777778 0.88888889 0.875 0.66666667 1. 0.85714286 0.77777778 0.88888889 0.72727273 1. ] mean value: 0.8459415584415584 key: train_precision value: [0.92647059 1. 1. 0.95522388 1. 1. 1. 1. 0.95454545 1. ] mean value: 0.9836239923377763 key: test_recall value: [0.875 1. 1. 0.85714286 1. 0.85714286 0.875 1. 1. 0.875 ] mean value: 0.9339285714285714 key: train_recall value: [0.92647059 1. 1. 0.92753623 1. 1. 1. 1. 0.92647059 1. ] mean value: 0.9780477408354646 key: test_roc_auc value: [0.8125 0.9375 0.9375 0.74107143 1. 0.86607143 0.79464286 0.92857143 0.78571429 0.9375 ] mean value: 0.8741071428571429 key: train_roc_auc value: [0.92647059 1. 1. 0.94170929 1. 1. 1. 1. 0.94149616 1. ] mean value: 0.9809676044330776 key: test_jcc value: [0.7 0.88888889 0.875 0.6 1. 0.75 0.7 0.88888889 0.72727273 0.875 ] mean value: 0.8005050505050505 key: train_jcc value: [0.8630137 1. 1. 0.88888889 1. 1. 1. 1. 0.88732394 1. ] mean value: 0.9639226531180998 MCC on Blind test: 0.0 Accuracy on Blind test: 0.68 Model_name: Gaussian NB Model func: GaussianNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianNB())]) key: fit_time value: [0.00949454 0.00907326 0.00779319 0.00751305 0.00748301 0.0074904 0.00753403 0.00772738 0.00752568 0.00749397] mean value: 0.007912850379943848 key: score_time value: [0.01055598 0.01037431 0.00883818 0.00855374 0.00868988 0.00856733 0.00868702 0.00864244 0.00872922 0.00876641] mean value: 0.009040451049804688 key: test_mcc value: [0.37796447 0.25 0.60714286 0.26189246 0.46428571 0.56407607 0.19642857 0.41931393 0.21821789 0.34247476] mean value: 0.3701796738627931 key: train_mcc value: [0.59233863 0.52313884 0.49254979 0.53036644 0.56781069 0.53654458 0.71021843 0.58848522 0.56432157 0.58903512] mean value: 0.5694809310571065 key: test_accuracy value: [0.625 0.625 0.8 0.6 0.73333333 0.73333333 0.6 0.66666667 0.6 0.66666667] mean value: 0.665 key: train_accuracy value: [0.78676471 0.75 0.72992701 0.75912409 0.76642336 0.75182482 0.84671533 0.7810219 0.77372263 0.77372263] mean value: 0.7719246457707171 key: test_fscore value: [0.72727273 0.625 0.8 0.66666667 0.71428571 0.77777778 0.625 0.76190476 0.57142857 0.73684211] mean value: 0.7006178324599377 key: train_fscore value: [0.81045752 0.78205128 0.77300613 0.78431373 0.80246914 0.79012346 0.82644628 0.80769231 0.7394958 0.80745342] mean value: 0.7923509054595705 key: test_precision value: [0.57142857 0.625 0.75 0.54545455 0.71428571 0.63636364 0.625 0.61538462 0.66666667 0.63636364] mean value: 0.6385947385947386 key: train_precision value: [0.72941176 0.69318182 0.67021277 0.71428571 0.69892473 0.68817204 0.94339623 0.71590909 0.8627451 0.69892473] mean value: 0.7415163983870607 key: test_recall value: [1. 0.625 0.85714286 0.85714286 0.71428571 1. 0.625 1. 0.5 0.875 ] mean value: 0.8053571428571429 key: train_recall value: [0.91176471 0.89705882 0.91304348 0.86956522 0.94202899 0.92753623 0.73529412 0.92647059 0.64705882 0.95588235] mean value: 0.8725703324808184 key: test_roc_auc value: [0.625 0.625 0.80357143 0.61607143 0.73214286 0.75 0.59821429 0.64285714 0.60714286 0.65178571] mean value: 0.6651785714285714 key: train_roc_auc value: [0.78676471 0.75 0.72858056 0.75831202 0.76513214 0.75053282 0.84590793 0.78207587 0.77280477 0.77504263] mean value: 0.7715153452685422 key: test_jcc value: [0.57142857 0.45454545 0.66666667 0.5 0.55555556 0.63636364 0.45454545 0.61538462 0.4 0.58333333] mean value: 0.5437823287823288 key: train_jcc value: [0.68131868 0.64210526 0.63 0.64516129 0.67010309 0.65306122 0.70422535 0.67741935 0.58666667 0.67708333] mean value: 0.6567144259023844 MCC on Blind test: 0.02 Accuracy on Blind test: 0.47 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.00795698 0.00770211 0.00773144 0.00778985 0.00778174 0.00784087 0.00769806 0.0066731 0.00672388 0.0066855 ] mean value: 0.007458353042602539 key: score_time value: [0.00865579 0.00877738 0.00871015 0.00862479 0.00876021 0.00872946 0.00876045 0.00781512 0.00778174 0.00782132] mean value: 0.008443641662597656 key: test_mcc value: [ 0.25 -0.25 0.73214286 0.09449112 0.75592895 0.49099025 0.33928571 -0.13363062 0.33928571 0.19642857] mean value: 0.2814922553488389 key: train_mcc value: [0.50195781 0.54894692 0.44946013 0.47724794 0.37278745 0.44522592 0.41602728 0.48933032 0.41632915 0.44553401] mean value: 0.4562846929723249 key: test_accuracy value: [0.625 0.375 0.86666667 0.53333333 0.86666667 0.73333333 0.66666667 0.46666667 0.66666667 0.6 ] mean value: 0.64 key: train_accuracy value: [0.75 0.77205882 0.72262774 0.73722628 0.68613139 0.72262774 0.7080292 0.74452555 0.7080292 0.72262774] mean value: 0.727388364104766 key: test_fscore value: [0.625 0.375 0.85714286 0.58823529 0.83333333 0.75 0.66666667 0.6 0.66666667 0.625 ] mean value: 0.658704481792717 key: train_fscore value: [0.76056338 0.7862069 0.74324324 0.75342466 0.68148148 0.72463768 0.70588235 0.73684211 0.71014493 0.72463768] mean value: 0.7327064407151792 key: test_precision value: [0.625 0.375 0.85714286 0.5 1. 0.66666667 0.71428571 0.5 0.71428571 0.625 ] mean value: 0.6577380952380952 key: train_precision value: [0.72972973 0.74025974 0.69620253 0.71428571 0.6969697 0.72463768 0.70588235 0.75384615 0.7 0.71428571] mean value: 0.7176099315122916 key: test_recall value: [0.625 0.375 0.85714286 0.71428571 0.71428571 0.85714286 0.625 0.75 0.625 0.625 ] mean value: 0.6767857142857143 key: train_recall value: [0.79411765 0.83823529 0.79710145 0.79710145 0.66666667 0.72463768 0.70588235 0.72058824 0.72058824 0.73529412] mean value: 0.7500213128729752 key: test_roc_auc value: [0.625 0.375 0.86607143 0.54464286 0.85714286 0.74107143 0.66964286 0.44642857 0.66964286 0.59821429] mean value: 0.6392857142857143 key: train_roc_auc value: [0.75 0.77205882 0.72208014 0.73678602 0.68627451 0.72261296 0.70801364 0.74435209 0.7081202 0.72271952] mean value: 0.7273017902813299 key: test_jcc value: [0.45454545 0.23076923 0.75 0.41666667 0.71428571 0.6 0.5 0.42857143 0.5 0.45454545] mean value: 0.504938394938395 key: train_jcc value: [0.61363636 0.64772727 0.59139785 0.6043956 0.51685393 0.56818182 0.54545455 0.58333333 0.5505618 0.56818182] mean value: 0.57897243357102 MCC on Blind test: 0.1 Accuracy on Blind test: 0.58 Model_name: K-Nearest Neighbors Model func: KNeighborsClassifier() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', KNeighborsClassifier())]) key: fit_time value: [0.00700665 0.00714684 0.00702357 0.007092 0.00712657 0.0071609 0.00703955 0.00731921 0.00701737 0.00706315] mean value: 0.007099580764770508 key: score_time value: [0.00979042 0.00942588 0.00939441 0.00933671 0.00949192 0.00942111 0.009372 0.00936747 0.00985074 0.00934005] mean value: 0.009479069709777832 key: test_mcc value: [ 0.51639778 0.25819889 0.73214286 0.21821789 0.75592895 0.32732684 -0.02620712 0.32732684 0.73214286 0.60714286] mean value: 0.44486186267144306 key: train_mcc value: [0.72254413 0.69486799 0.68583647 0.72439971 0.62437433 0.68322489 0.68163703 0.68163703 0.68011153 0.65087548] mean value: 0.6829508591825769 key: test_accuracy value: [0.75 0.625 0.86666667 0.6 0.86666667 0.66666667 0.46666667 0.66666667 0.86666667 0.8 ] mean value: 0.7175 key: train_accuracy value: [0.86029412 0.84558824 0.83941606 0.86131387 0.81021898 0.83941606 0.83941606 0.83941606 0.83941606 0.82481752] mean value: 0.8399313009875483 key: test_fscore value: [0.77777778 0.66666667 0.85714286 0.625 0.83333333 0.61538462 0.2 0.70588235 0.875 0.8 ] mean value: 0.6956187603246426 key: train_fscore value: [0.86524823 0.85314685 0.85135135 0.86713287 0.82191781 0.84931507 0.84507042 0.84507042 0.84285714 0.82857143] mean value: 0.8469681591792749 key: test_precision value: [0.7 0.6 0.85714286 0.55555556 1. 0.66666667 0.5 0.66666667 0.875 0.85714286] mean value: 0.7278174603174603 key: train_precision value: [0.83561644 0.81333333 0.79746835 0.83783784 0.77922078 0.80519481 0.81081081 0.81081081 0.81944444 0.80555556] mean value: 0.8115293169994922 key: test_recall value: [0.875 0.75 0.85714286 0.71428571 0.71428571 0.57142857 0.125 0.75 0.875 0.75 ] mean value: 0.6982142857142857 key: train_recall value: [0.89705882 0.89705882 0.91304348 0.89855072 0.86956522 0.89855072 0.88235294 0.88235294 0.86764706 0.85294118] mean value: 0.8859121909633418 key: test_roc_auc value: [0.75 0.625 0.86607143 0.60714286 0.85714286 0.66071429 0.49107143 0.66071429 0.86607143 0.80357143] mean value: 0.71875 key: train_roc_auc value: [0.86029412 0.84558824 0.83887468 0.86104007 0.80978261 0.83898124 0.8397272 0.8397272 0.83962063 0.82502131] mean value: 0.8398657289002557 key: test_jcc value: [0.63636364 0.5 0.75 0.45454545 0.71428571 0.44444444 0.11111111 0.54545455 0.77777778 0.66666667] mean value: 0.560064935064935 key: train_jcc value: [0.7625 0.74390244 0.74117647 0.7654321 0.69767442 0.73809524 0.73170732 0.73170732 0.72839506 0.70731707] mean value: 0.7347907434123415 MCC on Blind test: 0.06 Accuracy on Blind test: 0.68 Model_name: SVM Model func: SVC(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SVC(random_state=42))]) key: fit_time value: [0.00901961 0.00860167 0.0086236 0.00857639 0.00859451 0.00781941 0.00762987 0.00763583 0.00852704 0.00775576] mean value: 0.008278369903564453 key: score_time value: [0.00887251 0.00862575 0.00865078 0.00868511 0.00861216 0.00795341 0.00790024 0.00792432 0.00796032 0.00794959] mean value: 0.008313417434692383 key: test_mcc value: [0.62994079 0.62994079 0.73214286 0.56407607 0.87287156 0.60714286 0.33928571 0.18898224 0.75592895 0.875 ] mean value: 0.6195311823553656 key: train_mcc value: [0.77949606 0.85331034 0.85540562 0.86948194 0.82629176 0.86939892 0.8978896 0.83947987 0.85400682 0.86868474] mean value: 0.8513445663864698 key: test_accuracy value: [0.8125 0.8125 0.86666667 0.73333333 0.93333333 0.8 0.66666667 0.6 0.86666667 0.93333333] mean value: 0.8025 key: train_accuracy value: [0.88970588 0.92647059 0.9270073 0.93430657 0.91240876 0.93430657 0.94890511 0.91970803 0.9270073 0.93430657] mean value: 0.9254132674967797 key: test_fscore value: [0.82352941 0.8 0.85714286 0.77777778 0.92307692 0.8 0.66666667 0.66666667 0.88888889 0.93333333] mean value: 0.8137082525317819 key: train_fscore value: [0.88888889 0.92753623 0.92957746 0.93333333 0.91044776 0.93617021 0.94814815 0.91851852 0.92647059 0.93333333] mean value: 0.9252424481090294 key: test_precision value: [0.77777778 0.85714286 0.85714286 0.63636364 1. 0.75 0.71428571 0.6 0.8 1. ] mean value: 0.7992712842712842 key: train_precision value: [0.89552239 0.91428571 0.90410959 0.95454545 0.93846154 0.91666667 0.95522388 0.92537313 0.92647059 0.94029851] mean value: 0.9270957461683526 key: test_recall value: [0.875 0.75 0.85714286 1. 0.85714286 0.85714286 0.625 0.75 1. 0.875 ] mean value: 0.8446428571428571 key: train_recall value: [0.88235294 0.94117647 0.95652174 0.91304348 0.88405797 0.95652174 0.94117647 0.91176471 0.92647059 0.92647059] mean value: 0.9239556692242115 key: test_roc_auc value: [0.8125 0.8125 0.86607143 0.75 0.92857143 0.80357143 0.66964286 0.58928571 0.85714286 0.9375 ] mean value: 0.8026785714285715 key: train_roc_auc value: [0.88970588 0.92647059 0.92679028 0.93446292 0.91261722 0.93414322 0.9488491 0.91965047 0.92700341 0.93424979] mean value: 0.9253942881500427 key: test_jcc value: [0.7 0.66666667 0.75 0.63636364 0.85714286 0.66666667 0.5 0.5 0.8 0.875 ] mean value: 0.6951839826839826 key: train_jcc value: [0.8 0.86486486 0.86842105 0.875 0.83561644 0.88 0.90140845 0.84931507 0.8630137 0.875 ] mean value: 0.8612639573680121 MCC on Blind test: 0.13 Accuracy on Blind test: 0.69 Model_name: MLP Model func: MLPClassifier(max_iter=500, random_state=42) List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/neural_network/_multilayer_perceptron.py:702: ConvergenceWarning: Stochastic Optimizer: Maximum iterations (500) reached and the optimization hasn't converged yet. warnings.warn( [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MLPClassifier(max_iter=500, random_state=42))]) key: fit_time value: [0.47060013 0.6176157 0.50893569 0.47440553 0.48640704 0.68746996 0.48285437 0.49517989 0.48710799 0.6235292 ] mean value: 0.5334105491638184 key: score_time value: [0.01105475 0.01343441 0.01317406 0.01111579 0.01340437 0.01400685 0.01163292 0.01111388 0.01380134 0.01445436] mean value: 0.012719273567199707 key: test_mcc value: [0.77459667 0.75 0.87287156 0.49099025 1. 0.73214286 0.47245559 0.32732684 0.75592895 0.73214286] mean value: 0.6908455570136127 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.875 0.875 0.93333333 0.73333333 1. 0.86666667 0.73333333 0.66666667 0.86666667 0.86666667] mean value: 0.8416666666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.88888889 0.875 0.92307692 0.75 1. 0.85714286 0.77777778 0.70588235 0.88888889 0.875 ] mean value: 0.8541657688716512 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.8 0.875 1. 0.66666667 1. 0.85714286 0.7 0.66666667 0.8 0.875 ] mean value: 0.824047619047619 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.875 0.85714286 0.85714286 1. 0.85714286 0.875 0.75 1. 0.875 ] mean value: 0.8946428571428571 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.875 0.875 0.92857143 0.74107143 1. 0.86607143 0.72321429 0.66071429 0.85714286 0.86607143] mean value: 0.8392857142857143 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.8 0.77777778 0.85714286 0.6 1. 0.75 0.63636364 0.54545455 0.8 0.77777778] mean value: 0.7544516594516595 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.06 Accuracy on Blind test: 0.69 Model_name: Decision Tree Model func: DecisionTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', DecisionTreeClassifier(random_state=42))]) key: fit_time value: [0.01054811 0.01029015 0.00749731 0.00756335 0.00791764 0.00744152 0.0080564 0.0079093 0.00725198 0.00789857] mean value: 0.008237433433532716 key: score_time value: [0.01101589 0.00920248 0.00816894 0.00859499 0.00827861 0.00804591 0.00809813 0.00824928 0.00804496 0.00812697] mean value: 0.008582615852355957 key: test_mcc value: [1. 0.77459667 0.875 0.76376262 1. 0.87287156 1. 1. 0.87287156 0.875 ] mean value: 0.9034102406955395 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.875 0.93333333 0.86666667 1. 0.93333333 1. 1. 0.93333333 0.93333333] mean value: 0.9475 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.88888889 0.93333333 0.875 1. 0.92307692 1. 1. 0.94117647 0.93333333] mean value: 0.9494808949220714 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.8 0.875 0.77777778 1. 1. 1. 1. 0.88888889 1. ] mean value: 0.9341666666666667 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 0.85714286 1. 1. 1. 0.875 ] mean value: 0.9732142857142857 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.875 0.9375 0.875 1. 0.92857143 1. 1. 0.92857143 0.9375 ] mean value: 0.9482142857142857 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.8 0.875 0.77777778 1. 0.85714286 1. 1. 0.88888889 0.875 ] mean value: 0.9073809523809524 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.13 Accuracy on Blind test: 0.85 Model_name: Extra Trees Model func: ExtraTreesClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreesClassifier(random_state=42))]) key: fit_time value: [0.08002043 0.08039927 0.08053231 0.08475113 0.0849824 0.07976866 0.07944989 0.08087707 0.07947731 0.0836072 ] mean value: 0.08138656616210938 key: score_time value: [0.01744008 0.01705742 0.01676226 0.01815081 0.01665783 0.01678061 0.01786375 0.0182426 0.01667714 0.01768732] mean value: 0.017331981658935548 key: test_mcc value: [0.8819171 0.75 0.87287156 0.66143783 1. 0.87287156 0.46428571 0.76376262 0.875 0.76376262] mean value: 0.7905908999279945 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9375 0.875 0.93333333 0.8 1. 0.93333333 0.73333333 0.86666667 0.93333333 0.86666667] mean value: 0.8879166666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94117647 0.875 0.92307692 0.82352941 1. 0.92307692 0.75 0.85714286 0.93333333 0.85714286] mean value: 0.8883478776125835 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88888889 0.875 1. 0.7 1. 1. 0.75 1. 1. 1. ] mean value: 0.9213888888888889 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.875 0.85714286 1. 1. 0.85714286 0.75 0.75 0.875 0.75 ] mean value: 0.8714285714285714 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9375 0.875 0.92857143 0.8125 1. 0.92857143 0.73214286 0.875 0.9375 0.875 ] mean value: 0.8901785714285715 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88888889 0.77777778 0.85714286 0.7 1. 0.85714286 0.6 0.75 0.875 0.75 ] mean value: 0.805595238095238 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.1 Accuracy on Blind test: 0.81 Model_name: Extra Tree Model func: ExtraTreeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', ExtraTreeClassifier(random_state=42))]) key: fit_time value: [0.00745249 0.0100553 0.00690699 0.00686359 0.00682211 0.0068984 0.00738096 0.00688457 0.00711179 0.00706244] mean value: 0.007343864440917969 key: score_time value: [0.00842643 0.00823832 0.00787807 0.00817585 0.00788713 0.00785685 0.00784945 0.00778127 0.00796604 0.00789261] mean value: 0.007995200157165528 key: test_mcc value: [1. 0.40451992 0.60714286 0.875 0.76376262 0.33928571 0.76376262 0.46428571 0.75592895 0.875 ] mean value: 0.6848688380862632 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.6875 0.8 0.93333333 0.86666667 0.66666667 0.86666667 0.73333333 0.86666667 0.93333333] mean value: 0.8354166666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.73684211 0.8 0.93333333 0.875 0.66666667 0.85714286 0.75 0.88888889 0.93333333] mean value: 0.8441207184628238 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.63636364 0.75 0.875 0.77777778 0.625 1. 0.75 0.8 1. ] mean value: 0.8214141414141414 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.875 0.85714286 1. 1. 0.71428571 0.75 0.75 1. 0.875 ] mean value: 0.8821428571428571 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.6875 0.80357143 0.9375 0.875 0.66964286 0.875 0.73214286 0.85714286 0.9375 ] mean value: 0.8375 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.58333333 0.66666667 0.875 0.77777778 0.5 0.75 0.6 0.8 0.875 ] mean value: 0.7427777777777778 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.73 Model_name: Random Forest Model func: RandomForestClassifier(n_estimators=1000, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(n_estimators=1000, random_state=42))]) key: fit_time value: [0.99027419 1.03364635 0.99537587 0.99380183 1.01279187 1.00695038 1.00400448 0.99069548 0.98471999 0.9896822 ] mean value: 1.000194263458252 key: score_time value: [0.09284711 0.09792686 0.09596872 0.09674335 0.097049 0.09700847 0.09636211 0.08898997 0.08923626 0.15491176] mean value: 0.10070436000823975 key: test_mcc value: [0.8819171 0.8819171 0.875 0.76376262 1. 0.87287156 0.60714286 0.87287156 1. 0.73214286] mean value: 0.848762565937602 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9375 0.9375 0.93333333 0.86666667 1. 0.93333333 0.8 0.93333333 1. 0.86666667] mean value: 0.9208333333333334 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94117647 0.94117647 0.93333333 0.875 1. 0.92307692 0.8 0.94117647 1. 0.875 ] mean value: 0.9229939668174962 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88888889 0.88888889 0.875 0.77777778 1. 1. 0.85714286 0.88888889 1. 0.875 ] mean value: 0.9051587301587302 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 0.85714286 0.75 1. 1. 0.875 ] mean value: 0.9482142857142857 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9375 0.9375 0.9375 0.875 1. 0.92857143 0.80357143 0.92857143 1. 0.86607143] mean value: 0.9214285714285715 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88888889 0.88888889 0.875 0.77777778 1. 0.85714286 0.66666667 0.88888889 1. 0.77777778] mean value: 0.8621031746031745 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.83 Model_name: Random Forest2 Model func: RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/ensemble/_forest.py:427: FutureWarning: `max_features='auto'` has been deprecated in 1.1 and will be removed in 1.3. To keep the past behaviour, explicitly set `max_features='sqrt'` or remove this parameter as it is also the default value for RandomForestClassifiers and ExtraTreesClassifiers. warn( Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.79997134 0.8617053 0.82350397 0.86626768 0.86152625 0.8775835 0.89794159 0.84732342 0.82997847 0.88673472] mean value: 0.8552536249160767 key: score_time value: [0.23055267 0.18991017 0.19632292 0.2545855 0.13287044 0.18487072 0.21556759 0.20604992 0.17664123 0.12801123] mean value: 0.19153823852539062 key: test_mcc value: [0.8819171 0.75 0.875 0.76376262 0.87287156 0.73214286 0.60714286 0.73214286 1. 0.73214286] mean value: 0.7947122709029568 key: train_mcc value: [0.97100831 0.94117647 0.95710706 0.98550418 0.95630861 0.97080136 0.98550418 0.98550725 0.97122151 0.98550725] mean value: 0.9709646177394017 key: test_accuracy value: [0.9375 0.875 0.93333333 0.86666667 0.93333333 0.86666667 0.8 0.86666667 1. 0.86666667] mean value: 0.8945833333333334 key: train_accuracy value: [0.98529412 0.97058824 0.97810219 0.99270073 0.97810219 0.98540146 0.99270073 0.99270073 0.98540146 0.99270073] mean value: 0.9853692571919279 key: test_fscore value: [0.94117647 0.875 0.93333333 0.875 0.92307692 0.85714286 0.8 0.875 1. 0.875 ] mean value: 0.8954729584141349 key: train_fscore value: [0.98550725 0.97058824 0.9787234 0.99280576 0.97810219 0.98550725 0.99259259 0.99270073 0.98550725 0.99270073] mean value: 0.9854735376303184 key: test_precision value: [0.88888889 0.875 0.875 0.77777778 1. 0.85714286 0.85714286 0.875 1. 0.875 ] mean value: 0.888095238095238 key: train_precision value: [0.97142857 0.97058824 0.95833333 0.98571429 0.98529412 0.98550725 1. 0.98550725 0.97142857 0.98550725] mean value: 0.9799308853976374 key: test_recall value: [1. 0.875 1. 1. 0.85714286 0.85714286 0.75 0.875 1. 0.875 ] mean value: 0.9089285714285714 key: train_recall value: [1. 0.97058824 1. 1. 0.97101449 0.98550725 0.98529412 1. 1. 1. ] mean value: 0.9912404092071612 key: test_roc_auc value: [0.9375 0.875 0.9375 0.875 0.92857143 0.86607143 0.80357143 0.86607143 1. 0.86607143] mean value: 0.8955357142857143 key: train_roc_auc value: [0.98529412 0.97058824 0.97794118 0.99264706 0.97815431 0.98540068 0.99264706 0.99275362 0.98550725 0.99275362] mean value: 0.9853687127024723 key: test_jcc value: [0.88888889 0.77777778 0.875 0.77777778 0.85714286 0.75 0.66666667 0.77777778 1. 0.77777778] mean value: 0.8148809523809524 key: train_jcc value: [0.97142857 0.94285714 0.95833333 0.98571429 0.95714286 0.97142857 0.98529412 0.98550725 0.97142857 0.98550725] mean value: 0.9714641943734016 MCC on Blind test: 0.1 Accuracy on Blind test: 0.79 Model_name: Naive Bayes Model func: BernoulliNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BernoulliNB())]) key: fit_time value: [0.01698375 0.00691271 0.00714707 0.00702286 0.00691867 0.00681043 0.00706601 0.0070343 0.00683665 0.0071075 ] mean value: 0.007983994483947755 key: score_time value: [0.01220894 0.00788188 0.00856304 0.00795507 0.0079546 0.00788164 0.00790548 0.00793004 0.00796223 0.00810909] mean value: 0.008435201644897462 key: test_mcc value: [ 0.25 -0.25 0.73214286 0.09449112 0.75592895 0.49099025 0.33928571 -0.13363062 0.33928571 0.19642857] mean value: 0.2814922553488389 key: train_mcc value: [0.50195781 0.54894692 0.44946013 0.47724794 0.37278745 0.44522592 0.41602728 0.48933032 0.41632915 0.44553401] mean value: 0.4562846929723249 key: test_accuracy value: [0.625 0.375 0.86666667 0.53333333 0.86666667 0.73333333 0.66666667 0.46666667 0.66666667 0.6 ] mean value: 0.64 key: train_accuracy value: [0.75 0.77205882 0.72262774 0.73722628 0.68613139 0.72262774 0.7080292 0.74452555 0.7080292 0.72262774] mean value: 0.727388364104766 key: test_fscore value: [0.625 0.375 0.85714286 0.58823529 0.83333333 0.75 0.66666667 0.6 0.66666667 0.625 ] mean value: 0.658704481792717 key: train_fscore value: [0.76056338 0.7862069 0.74324324 0.75342466 0.68148148 0.72463768 0.70588235 0.73684211 0.71014493 0.72463768] mean value: 0.7327064407151792 key: test_precision value: [0.625 0.375 0.85714286 0.5 1. 0.66666667 0.71428571 0.5 0.71428571 0.625 ] mean value: 0.6577380952380952 key: train_precision value: [0.72972973 0.74025974 0.69620253 0.71428571 0.6969697 0.72463768 0.70588235 0.75384615 0.7 0.71428571] mean value: 0.7176099315122916 key: test_recall value: [0.625 0.375 0.85714286 0.71428571 0.71428571 0.85714286 0.625 0.75 0.625 0.625 ] mean value: 0.6767857142857143 key: train_recall value: [0.79411765 0.83823529 0.79710145 0.79710145 0.66666667 0.72463768 0.70588235 0.72058824 0.72058824 0.73529412] mean value: 0.7500213128729752 key: test_roc_auc value: [0.625 0.375 0.86607143 0.54464286 0.85714286 0.74107143 0.66964286 0.44642857 0.66964286 0.59821429] mean value: 0.6392857142857143 key: train_roc_auc value: [0.75 0.77205882 0.72208014 0.73678602 0.68627451 0.72261296 0.70801364 0.74435209 0.7081202 0.72271952] mean value: 0.7273017902813299 key: test_jcc value: [0.45454545 0.23076923 0.75 0.41666667 0.71428571 0.6 0.5 0.42857143 0.5 0.45454545] mean value: 0.504938394938395 key: train_jcc value: [0.61363636 0.64772727 0.59139785 0.6043956 0.51685393 0.56818182 0.54545455 0.58333333 0.5505618 0.56818182] mean value: 0.57897243357102 MCC on Blind test: 0.1 Accuracy on Blind test: 0.58 Model_name: XGBoost Model func: XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=None, booster=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, enable_categorical=False, gamma=None, gpu_id=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... interaction_constraints=None, learning_rate=None, max_delta_step=None, max_depth=None, min_child_weight=None, missing=nan, monotone_constraints=None, n_estimators=100, n_jobs=None, num_parallel_tree=None, predictor=None, random_state=42, reg_alpha=None, reg_lambda=None, scale_pos_weight=None, subsample=None, tree_method=None, use_label_encoder=False, validate_parameters=None, verbosity=0))]) key: fit_time value: [0.06262994 0.03505611 0.03692508 0.03597021 0.06148291 0.03540158 0.03483367 0.03505754 0.04629922 0.03492475] mean value: 0.04185810089111328 key: score_time value: [0.01055789 0.01049376 0.01050019 0.01044226 0.01041293 0.01036716 0.01037478 0.0117774 0.01043272 0.01040506] mean value: 0.010576415061950683 key: test_mcc value: [1. 0.8819171 0.875 0.76376262 1. 1. 0.87287156 1. 1. 0.875 ] mean value: 0.9268551280458139 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.9375 0.93333333 0.86666667 1. 1. 0.93333333 1. 1. 0.93333333] mean value: 0.9604166666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.94117647 0.93333333 0.875 1. 1. 0.94117647 1. 1. 0.93333333] mean value: 0.9624019607843137 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.88888889 0.875 0.77777778 1. 1. 0.88888889 1. 1. 1. ] mean value: 0.9430555555555555 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.875] mean value: 0.9875 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.9375 0.9375 0.875 1. 1. 0.92857143 1. 1. 0.9375 ] mean value: 0.9616071428571429 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.88888889 0.875 0.77777778 1. 1. 0.88888889 1. 1. 0.875 ] mean value: 0.9305555555555556 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.12 Accuracy on Blind test: 0.84 Model_name: LDA Model func: LinearDiscriminantAnalysis() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', LinearDiscriminantAnalysis())]) key: fit_time value: [0.01320124 0.01201296 0.01213074 0.01227403 0.01183081 0.01187682 0.01212811 0.01183581 0.01189661 0.03893161] mean value: 0.014811873435974121 key: score_time value: [0.0113101 0.01076269 0.01052332 0.01057029 0.0105257 0.01047421 0.0105381 0.01045918 0.01049614 0.01063371] mean value: 0.01062934398651123 key: test_mcc value: [0.77459667 0.77459667 0.73214286 0.66143783 0.87287156 0.87287156 0.75592895 0.47245559 0.64465837 1. ] mean value: 0.7561560053780203 key: train_mcc value: [0.92898531 0.92737353 0.91392776 0.97120941 0.91277477 0.94318882 0.88668406 0.94323594 0.91597649 0.92791659] mean value: 0.927127267186985 key: test_accuracy value: [0.875 0.875 0.86666667 0.8 0.93333333 0.93333333 0.86666667 0.73333333 0.8 1. ] mean value: 0.8683333333333334 key: train_accuracy value: [0.96323529 0.96323529 0.95620438 0.98540146 0.95620438 0.97080292 0.94160584 0.97080292 0.95620438 0.96350365] mean value: 0.9627200515242593 key: test_fscore value: [0.88888889 0.88888889 0.85714286 0.82352941 0.92307692 0.92307692 0.88888889 0.77777778 0.84210526 1. ] mean value: 0.8813375822663748 key: train_fscore value: [0.96453901 0.96402878 0.95774648 0.98571429 0.95714286 0.97183099 0.94366197 0.97142857 0.95774648 0.96402878] mean value: 0.9637868190827705 key: test_precision value: [0.8 0.8 0.85714286 0.7 1. 1. 0.8 0.7 0.72727273 1. ] mean value: 0.8384415584415584 key: train_precision value: [0.93150685 0.94366197 0.93150685 0.97183099 0.94366197 0.94520548 0.90540541 0.94444444 0.91891892 0.94366197] mean value: 0.9379804848259411 key: test_recall value: [1. 1. 0.85714286 1. 0.85714286 0.85714286 1. 0.875 1. 1. ] mean value: 0.9446428571428571 key: train_recall value: [1. 0.98529412 0.98550725 1. 0.97101449 1. 0.98529412 1. 1. 0.98529412] mean value: 0.9912404092071612 key: test_roc_auc value: [0.875 0.875 0.86607143 0.8125 0.92857143 0.92857143 0.85714286 0.72321429 0.78571429 1. ] mean value: 0.8651785714285715 key: train_roc_auc value: [0.96323529 0.96323529 0.95598892 0.98529412 0.95609548 0.97058824 0.94192242 0.97101449 0.95652174 0.96366155] mean value: 0.9627557544757033 key: test_jcc value: [0.8 0.8 0.75 0.7 0.85714286 0.85714286 0.8 0.63636364 0.72727273 1. ] mean value: 0.7927922077922078 key: train_jcc value: [0.93150685 0.93055556 0.91891892 0.97183099 0.91780822 0.94520548 0.89333333 0.94444444 0.91891892 0.93055556] mean value: 0.9303078260587425 MCC on Blind test: 0.05 Accuracy on Blind test: 0.64 Model_name: Multinomial Model func: MultinomialNB() List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', MultinomialNB())]) key: fit_time value: [0.00955915 0.0070889 0.00685692 0.00679111 0.00704551 0.0069344 0.00679803 0.0070312 0.00693274 0.00679612] mean value: 0.0071834087371826175 key: score_time value: [0.01106119 0.00804234 0.00776124 0.00791359 0.00784922 0.00808263 0.0079062 0.00786209 0.00788569 0.00790429] mean value: 0.008226847648620606 key: test_mcc value: [ 0.12598816 0.25819889 0.73214286 0.33928571 0.87287156 0.37796447 0.19642857 -0.13363062 0.46428571 0.6000992 ] mean value: 0.3833634515705724 key: train_mcc value: [0.48661135 0.51745489 0.47592003 0.50667322 0.41725962 0.50373224 0.50394373 0.5339313 0.53314859 0.47473887] mean value: 0.4953413853595016 key: test_accuracy value: [0.5625 0.625 0.86666667 0.66666667 0.93333333 0.66666667 0.6 0.46666667 0.73333333 0.8 ] mean value: 0.6920833333333334 key: train_accuracy value: [0.74264706 0.75735294 0.73722628 0.75182482 0.7080292 0.75182482 0.75182482 0.76642336 0.76642336 0.73722628] mean value: 0.747080291970803 key: test_fscore value: [0.58823529 0.57142857 0.85714286 0.66666667 0.92307692 0.70588235 0.625 0.6 0.75 0.82352941] mean value: 0.7110962077138547 key: train_fscore value: [0.75177305 0.76923077 0.75 0.76712329 0.72222222 0.75714286 0.75362319 0.77142857 0.76811594 0.73913043] mean value: 0.7549790322558434 key: test_precision value: [0.55555556 0.66666667 0.85714286 0.625 1. 0.6 0.625 0.5 0.75 0.77777778] mean value: 0.6957142857142857 key: train_precision value: [0.7260274 0.73333333 0.72 0.72727273 0.69333333 0.74647887 0.74285714 0.75 0.75714286 0.72857143] mean value: 0.7325017093010533 key: test_recall value: [0.625 0.5 0.85714286 0.71428571 0.85714286 0.85714286 0.625 0.75 0.75 0.875 ] mean value: 0.7410714285714286 key: train_recall value: [0.77941176 0.80882353 0.7826087 0.8115942 0.75362319 0.76811594 0.76470588 0.79411765 0.77941176 0.75 ] mean value: 0.7792412617220801 key: test_roc_auc value: [0.5625 0.625 0.86607143 0.66964286 0.92857143 0.67857143 0.59821429 0.44642857 0.73214286 0.79464286] mean value: 0.6901785714285714 key: train_roc_auc value: [0.74264706 0.75735294 0.73689258 0.75138534 0.70769395 0.75170503 0.75191816 0.76662404 0.76651748 0.73731884] mean value: 0.7470055413469735 key: test_jcc value: [0.41666667 0.4 0.75 0.5 0.85714286 0.54545455 0.45454545 0.42857143 0.6 0.7 ] mean value: 0.5652380952380952 key: train_jcc value: [0.60227273 0.625 0.6 0.62222222 0.56521739 0.6091954 0.60465116 0.62790698 0.62352941 0.5862069 ] mean value: 0.6066202190949461 MCC on Blind test: 0.1 Accuracy on Blind test: 0.6 Model_name: Passive Aggresive Model func: PassiveAggressiveClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', PassiveAggressiveClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.00778508 0.00735497 0.00741458 0.00750518 0.0074389 0.00734568 0.00739765 0.00764251 0.00755811 0.00754356] mean value: 0.007498621940612793 key: score_time value: [0.00792003 0.00796342 0.00831413 0.00790501 0.0078187 0.00797248 0.00821495 0.00799203 0.0079875 0.00809884] mean value: 0.008018708229064942 key: test_mcc value: [0.62994079 0.62994079 0.875 0.19642857 0.87287156 0.87287156 0.32732684 0.75592895 0.64465837 0.875 ] mean value: 0.6679967422606682 key: train_mcc value: [0.88580789 0.91334626 0.89863497 0.83795818 0.91240409 0.83063246 0.92787101 0.91281179 0.92710997 0.92709446] mean value: 0.8973671087701672 key: test_accuracy value: [0.8125 0.8125 0.93333333 0.6 0.93333333 0.93333333 0.66666667 0.86666667 0.8 0.93333333] mean value: 0.8291666666666667 key: train_accuracy value: [0.94117647 0.95588235 0.94890511 0.91240876 0.95620438 0.91240876 0.96350365 0.95620438 0.96350365 0.96350365] mean value: 0.9473701159295835 key: test_fscore value: [0.82352941 0.82352941 0.93333333 0.57142857 0.92307692 0.92307692 0.70588235 0.88888889 0.84210526 0.93333333] mean value: 0.8368184412766456 key: train_fscore value: [0.93846154 0.95714286 0.95035461 0.9047619 0.95652174 0.90769231 0.96240602 0.95652174 0.96350365 0.96296296] mean value: 0.9460329323884149 key: test_precision value: [0.77777778 0.77777778 0.875 0.57142857 1. 1. 0.66666667 0.8 0.72727273 1. ] mean value: 0.819592352092352 key: train_precision value: [0.98387097 0.93055556 0.93055556 1. 0.95652174 0.96721311 0.98461538 0.94285714 0.95652174 0.97014925] mean value: 0.9622860453071885 key: test_recall value: [0.875 0.875 1. 0.57142857 0.85714286 0.85714286 0.75 1. 1. 0.875 ] mean value: 0.8660714285714286 key: train_recall value: [0.89705882 0.98529412 0.97101449 0.82608696 0.95652174 0.85507246 0.94117647 0.97058824 0.97058824 0.95588235] mean value: 0.9329283887468031 key: test_roc_auc value: [0.8125 0.8125 0.9375 0.59821429 0.92857143 0.92857143 0.66071429 0.85714286 0.78571429 0.9375 ] mean value: 0.8258928571428572 key: train_roc_auc value: [0.94117647 0.95588235 0.94874254 0.91304348 0.95620205 0.91283035 0.96334186 0.95630861 0.96355499 0.96344842] mean value: 0.9474531116794545 key: test_jcc value: [0.7 0.7 0.875 0.4 0.85714286 0.85714286 0.54545455 0.8 0.72727273 0.875 ] mean value: 0.7337012987012987 key: train_jcc value: [0.88405797 0.91780822 0.90540541 0.82608696 0.91666667 0.83098592 0.92753623 0.91666667 0.92957746 0.92857143] mean value: 0.8983362926190229 MCC on Blind test: 0.05 Accuracy on Blind test: 0.63 Model_name: Stochastic GDescent Model func: SGDClassifier(n_jobs=10, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', SGDClassifier(n_jobs=10, random_state=42))]) key: fit_time value: [0.01016855 0.0098474 0.0079248 0.00727248 0.00720954 0.00730157 0.00728512 0.0073278 0.00719166 0.00727534] mean value: 0.007880425453186036 key: score_time value: [0.010952 0.00936007 0.00861073 0.00789762 0.00792098 0.00781727 0.00834227 0.00790691 0.00788283 0.00789428] mean value: 0.008458495140075684 key: test_mcc value: [0.57735027 0.8819171 0.875 0.33928571 0.87287156 0.87287156 0.33928571 0.37796447 0.46428571 0.875 ] mean value: 0.6475832110632131 key: train_mcc value: [0.63408348 0.8979331 0.77817796 0.83063246 0.92951942 0.81712461 0.85977656 0.72794365 0.85721269 0.88920184] mean value: 0.822160576316637 key: test_accuracy value: [0.75 0.9375 0.93333333 0.66666667 0.93333333 0.93333333 0.66666667 0.66666667 0.73333333 0.93333333] mean value: 0.8154166666666667 key: train_accuracy value: [0.78676471 0.94852941 0.88321168 0.91240876 0.96350365 0.90510949 0.9270073 0.84671533 0.9270073 0.94160584] mean value: 0.9041863460712752 key: test_fscore value: [0.8 0.93333333 0.93333333 0.66666667 0.92307692 0.92307692 0.66666667 0.61538462 0.75 0.93333333] mean value: 0.8144871794871795 key: train_fscore value: [0.82424242 0.94736842 0.89333333 0.90769231 0.96240602 0.91156463 0.921875 0.8173913 0.92307692 0.9375 ] mean value: 0.904645035463338 key: test_precision value: [0.66666667 1. 0.875 0.625 1. 1. 0.71428571 0.8 0.75 1. ] mean value: 0.8430952380952381 key: train_precision value: [0.70103093 0.96923077 0.82716049 0.96721311 1. 0.85897436 0.98333333 1. 0.96774194 1. ] mean value: 0.9274684933438643 key: test_recall value: [1. 0.875 1. 0.71428571 0.85714286 0.85714286 0.625 0.5 0.75 0.875 ] mean value: 0.8053571428571429 key: train_recall value: [1. 0.92647059 0.97101449 0.85507246 0.92753623 0.97101449 0.86764706 0.69117647 0.88235294 0.88235294] mean value: 0.8974637681159421 key: test_roc_auc value: [0.75 0.9375 0.9375 0.66964286 0.92857143 0.92857143 0.66964286 0.67857143 0.73214286 0.9375 ] mean value: 0.8169642857142857 key: train_roc_auc value: [0.78676471 0.94852941 0.88256607 0.91283035 0.96376812 0.90462489 0.92657715 0.84558824 0.92668372 0.94117647] mean value: 0.9039109121909633 key: test_jcc value: [0.66666667 0.875 0.875 0.5 0.85714286 0.85714286 0.5 0.44444444 0.6 0.875 ] mean value: 0.7050396825396825 key: train_jcc value: [0.70103093 0.9 0.80722892 0.83098592 0.92753623 0.8375 0.85507246 0.69117647 0.85714286 0.88235294] mean value: 0.8290026723550397 MCC on Blind test: 0.06 Accuracy on Blind test: 0.89 Model_name: AdaBoost Classifier Model func: AdaBoostClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', AdaBoostClassifier(random_state=42))]) key: fit_time value: [0.07523441 0.06551671 0.06416392 0.06420016 0.06557775 0.06523657 0.06472826 0.06670904 0.06583929 0.06667423] mean value: 0.06638803482055664 key: score_time value: [0.01517701 0.01486087 0.01571703 0.01545548 0.01541901 0.01526618 0.01506066 0.01570487 0.01489067 0.01541162] mean value: 0.015296339988708496 key: test_mcc value: [0.8819171 0.8819171 0.875 0.66143783 1. 0.87287156 1. 0.87287156 0.87287156 0.875 ] mean value: 0.879388671797445 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [0.9375 0.9375 0.93333333 0.8 1. 0.93333333 1. 0.93333333 0.93333333 0.93333333] mean value: 0.9341666666666667 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [0.94117647 0.94117647 0.93333333 0.82352941 1. 0.92307692 1. 0.94117647 0.94117647 0.93333333] mean value: 0.9377978883861237 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [0.88888889 0.88888889 0.875 0.7 1. 1. 1. 0.88888889 0.88888889 1. ] mean value: 0.9130555555555555 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 0.85714286 1. 1. 1. 0.875 ] mean value: 0.9732142857142857 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [0.9375 0.9375 0.9375 0.8125 1. 0.92857143 1. 0.92857143 0.92857143 0.9375 ] mean value: 0.9348214285714286 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [0.88888889 0.88888889 0.875 0.7 1. 0.85714286 1. 0.88888889 0.88888889 0.875 ] mean value: 0.8862698412698412 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.08 Accuracy on Blind test: 0.76 Model_name: Bagging Classifier Model func: BaggingClassifier(n_jobs=10, oob_score=True, random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42))]) key: fit_time value: [0.03459382 0.04311633 0.02597976 0.02609849 0.03556275 0.03000331 0.02980828 0.03237772 0.04908109 0.03363228] mean value: 0.034025382995605466 key: score_time value: [0.03137994 0.01657486 0.01867056 0.01809192 0.03612328 0.02216148 0.02189708 0.01990652 0.03687644 0.01487947] mean value: 0.023656153678894044 key: test_mcc value: [1. 0.8819171 0.875 0.76376262 0.87287156 0.87287156 1. 1. 1. 0.875 ] mean value: 0.9141422841402109 key: train_mcc value: [1. 1. 0.98550418 1. 0.98550725 1. 1. 0.98550418 1. 0.98550725] mean value: 0.9942022851330479 key: test_accuracy value: [1. 0.9375 0.93333333 0.86666667 0.93333333 0.93333333 1. 1. 1. 0.93333333] mean value: 0.95375 key: train_accuracy value: [1. 1. 0.99270073 1. 0.99270073 1. 1. 0.99270073 1. 0.99270073] mean value: 0.997080291970803 key: test_fscore value: [1. 0.94117647 0.93333333 0.875 0.92307692 0.92307692 1. 1. 1. 0.93333333] mean value: 0.9528996983408748 key: train_fscore value: [1. 1. 0.99280576 1. 0.99270073 1. 1. 0.99259259 1. 0.99270073] mean value: 0.9970799807842291 key: test_precision value: [1. 0.88888889 0.875 0.77777778 1. 1. 1. 1. 1. 1. ] mean value: 0.9541666666666666 key: train_precision value: [1. 1. 0.98571429 1. 1. 1. 1. 1. 1. 0.98550725] mean value: 0.9971221532091097 key: test_recall value: [1. 1. 1. 1. 0.85714286 0.85714286 1. 1. 1. 0.875 ] mean value: 0.9589285714285715 key: train_recall value: [1. 1. 1. 1. 0.98550725 1. 1. 0.98529412 1. 1. ] mean value: 0.997080136402387 key: test_roc_auc value: [1. 0.9375 0.9375 0.875 0.92857143 0.92857143 1. 1. 1. 0.9375 ] mean value: 0.9544642857142858 key: train_roc_auc value: [1. 1. 0.99264706 1. 0.99275362 1. 1. 0.99264706 1. 0.99275362] mean value: 0.997080136402387 key: test_jcc value: [1. 0.88888889 0.875 0.77777778 0.85714286 0.85714286 1. 1. 1. 0.875 ] mean value: 0.9130952380952381 key: train_jcc value: [1. 1. 0.98571429 1. 0.98550725 1. 1. 0.98529412 1. 0.98550725] mean value: 0.9942022896114968 MCC on Blind test: 0.12 Accuracy on Blind test: 0.84 Model_name: Gaussian Process Model func: GaussianProcessClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GaussianProcessClassifier(random_state=42))]) key: fit_time value: [0.03075981 0.03832865 0.06963396 0.06718922 0.03995085 0.03923106 0.03885245 0.061131 0.04299402 0.03564787] mean value: 0.04637188911437988 key: score_time value: [0.02218819 0.01115394 0.01116896 0.03056479 0.02163672 0.02096963 0.02151918 0.03110862 0.01723385 0.01872468] mean value: 0.02062685489654541 key: test_mcc value: [0.67419986 0.75 0.87287156 0.37796447 1. 0.73214286 0.46428571 0.46428571 1. 0.76376262] mean value: 0.7099512797956697 key: train_mcc value: [0.95598573 0.98540068 0.97080136 0.95630861 0.97080136 0.95630861 0.97080136 0.97080136 0.97080136 0.97080136] mean value: 0.9678811811884551 key: test_accuracy value: [0.8125 0.875 0.93333333 0.66666667 1. 0.86666667 0.73333333 0.73333333 1. 0.86666667] mean value: 0.84875 key: train_accuracy value: [0.97794118 0.99264706 0.98540146 0.97810219 0.98540146 0.97810219 0.98540146 0.98540146 0.98540146 0.98540146] mean value: 0.9839201373980249 key: test_fscore value: [0.84210526 0.875 0.92307692 0.70588235 1. 0.85714286 0.75 0.75 1. 0.85714286] mean value: 0.8560350253461708 key: train_fscore value: [0.97777778 0.99259259 0.98550725 0.97810219 0.98550725 0.97810219 0.98529412 0.98529412 0.98529412 0.98529412] mean value: 0.9838765713274273 key: test_precision value: [0.72727273 0.875 1. 0.6 1. 0.85714286 0.75 0.75 1. 1. ] mean value: 0.8559415584415584 key: train_precision value: [0.98507463 1. 0.98550725 0.98529412 0.98550725 0.98529412 0.98529412 0.98529412 0.98529412 0.98529412] mean value: 0.9867853825501648 key: test_recall value: [1. 0.875 0.85714286 0.85714286 1. 0.85714286 0.75 0.75 1. 0.75 ] mean value: 0.8696428571428572 key: train_recall value: [0.97058824 0.98529412 0.98550725 0.97101449 0.98550725 0.97101449 0.98529412 0.98529412 0.98529412 0.98529412] mean value: 0.9810102301790282 key: test_roc_auc value: [0.8125 0.875 0.92857143 0.67857143 1. 0.86607143 0.73214286 0.73214286 1. 0.875 ] mean value: 0.85 key: train_roc_auc value: [0.97794118 0.99264706 0.98540068 0.97815431 0.98540068 0.97815431 0.98540068 0.98540068 0.98540068 0.98540068] mean value: 0.9839300937766412 key: test_jcc value: [0.72727273 0.77777778 0.85714286 0.54545455 1. 0.75 0.6 0.6 1. 0.75 ] mean value: 0.7607647907647908 key: train_jcc value: [0.95652174 0.98529412 0.97142857 0.95714286 0.97142857 0.95714286 0.97101449 0.97101449 0.97101449 0.97101449] mean value: 0.9683016684934843 MCC on Blind test: 0.06 Accuracy on Blind test: 0.66 Model_name: Gradient Boosting Model func: GradientBoostingClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', GradientBoostingClassifier(random_state=42))]) key: fit_time value: [0.09868574 0.10024285 0.09119558 0.08083391 0.09357262 0.08993793 0.10161471 0.10149956 0.09327483 0.08393335] mean value: 0.0934791088104248 key: score_time value: [0.00927162 0.00913954 0.00923514 0.00928712 0.00947499 0.00919628 0.00924039 0.00923944 0.00928307 0.00930262] mean value: 0.009267020225524902 key: test_mcc value: [1. 0.8819171 0.875 0.76376262 1. 0.87287156 1. 1. 0.87287156 0.73214286] mean value: 0.8998565698544966 key: train_mcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_accuracy value: [1. 0.9375 0.93333333 0.86666667 1. 0.93333333 1. 1. 0.93333333 0.86666667] mean value: 0.9470833333333334 key: train_accuracy value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_fscore value: [1. 0.94117647 0.93333333 0.875 1. 0.92307692 1. 1. 0.94117647 0.875 ] mean value: 0.9488763197586727 key: train_fscore value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_precision value: [1. 0.88888889 0.875 0.77777778 1. 1. 1. 1. 0.88888889 0.875 ] mean value: 0.9305555555555556 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 1. 1. 1. 1. 0.85714286 1. 1. 1. 0.875 ] mean value: 0.9732142857142857 key: train_recall value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_roc_auc value: [1. 0.9375 0.9375 0.875 1. 0.92857143 1. 1. 0.92857143 0.86607143] mean value: 0.9473214285714285 key: train_roc_auc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_jcc value: [1. 0.88888889 0.875 0.77777778 1. 0.85714286 1. 1. 0.88888889 0.77777778] mean value: 0.906547619047619 key: train_jcc value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 MCC on Blind test: 0.11 Accuracy on Blind test: 0.83 Model_name: QDA Model func: QuadraticDiscriminantAnalysis() List of models: /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") /home/tanu/anaconda3/envs/UQ/lib/python3.9/site-packages/sklearn/discriminant_analysis.py:887: UserWarning: Variables are collinear warnings.warn("Variables are collinear") [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', QuadraticDiscriminantAnalysis())]) key: fit_time value: [0.00916886 0.01091504 0.01081634 0.01080394 0.0134356 0.02716422 0.01087403 0.01102185 0.01133323 0.01125884] mean value: 0.012679195404052735 key: score_time value: [0.01023698 0.01037884 0.01042628 0.01103234 0.01079631 0.01123476 0.01329875 0.01280212 0.01062632 0.01068163] mean value: 0.011151432991027832 key: test_mcc value: [0.8819171 0.67419986 0.75592895 0.75592895 0.75592895 0.53452248 0.37796447 0.76376262 0.76376262 0.76376262] mean value: 0.7027678608518798 key: train_mcc value: [1. 0.90184995 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9901849950564579 key: test_accuracy value: [0.9375 0.8125 0.86666667 0.86666667 0.86666667 0.73333333 0.66666667 0.86666667 0.86666667 0.86666667] mean value: 0.835 key: train_accuracy value: [1. 0.94852941 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9948529411764706 key: test_fscore value: [0.94117647 0.76923077 0.83333333 0.83333333 0.83333333 0.6 0.61538462 0.85714286 0.85714286 0.85714286] mean value: 0.7997220426632191 key: train_fscore value: [1. 0.94573643 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9945736434108527 key: test_precision value: [0.88888889 1. 1. 1. 1. 1. 0.8 1. 1. 1. ] mean value: 0.9688888888888889 key: train_precision value: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] mean value: 1.0 key: test_recall value: [1. 0.625 0.71428571 0.71428571 0.71428571 0.42857143 0.5 0.75 0.75 0.75 ] mean value: 0.6946428571428571 key: train_recall value: [1. 0.89705882 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9897058823529412 key: test_roc_auc value: [0.9375 0.8125 0.85714286 0.85714286 0.85714286 0.71428571 0.67857143 0.875 0.875 0.875 ] mean value: 0.8339285714285715 key: train_roc_auc value: [1. 0.94852941 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9948529411764706 key: test_jcc value: [0.88888889 0.625 0.71428571 0.71428571 0.71428571 0.42857143 0.44444444 0.75 0.75 0.75 ] mean value: 0.6779761904761905 key: train_jcc value: [1. 0.89705882 1. 1. 1. 1. 1. 1. 1. 1. ] mean value: 0.9897058823529412 MCC on Blind test: -0.02 Accuracy on Blind test: 0.95 Model_name: Ridge Classifier Model func: RidgeClassifier(random_state=42) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifier(random_state=42))]) key: fit_time value: [0.01154399 0.01006269 0.00780892 0.00763559 0.00742674 0.00739622 0.00755072 0.00743032 0.00746632 0.00746202] mean value: 0.008178353309631348 key: score_time value: [0.01060176 0.00935245 0.00819874 0.00818491 0.00788951 0.00785613 0.00786829 0.00791764 0.00788474 0.0078702 ] mean value: 0.008362436294555664 key: test_mcc value: [0.75 0.62994079 0.73214286 0.49099025 0.87287156 0.87287156 0.64465837 0.6000992 0.64465837 0.875 ] mean value: 0.7113232961000079 key: train_mcc value: [0.83832595 0.86849267 0.85434012 0.91240409 0.86868474 0.8978896 0.88360693 0.82480818 0.86948194 0.8555278 ] mean value: 0.8673562022561286 key: test_accuracy value: [0.875 0.8125 0.86666667 0.73333333 0.93333333 0.93333333 0.8 0.8 0.8 0.93333333] mean value: 0.84875 key: train_accuracy value: [0.91911765 0.93382353 0.9270073 0.95620438 0.93430657 0.94890511 0.94160584 0.91240876 0.93430657 0.9270073 ] mean value: 0.9334693001288106 key: test_fscore value: [0.875 0.82352941 0.85714286 0.75 0.92307692 0.92307692 0.84210526 0.82352941 0.84210526 0.93333333] mean value: 0.8592899386475238 key: train_fscore value: [0.91970803 0.9352518 0.92857143 0.95652174 0.9352518 0.94964029 0.94202899 0.91176471 0.9352518 0.92857143] mean value: 0.9342562000313209 key: test_precision value: [0.875 0.77777778 0.85714286 0.66666667 1. 1. 0.72727273 0.77777778 0.72727273 1. ] mean value: 0.8408910533910534 key: train_precision value: [0.91304348 0.91549296 0.91549296 0.95652174 0.92857143 0.94285714 0.92857143 0.91176471 0.91549296 0.90277778] mean value: 0.9230586574290872 key: test_recall value: [0.875 0.875 0.85714286 0.85714286 0.85714286 0.85714286 1. 0.875 1. 0.875 ] mean value: 0.8928571428571428 key: train_recall value: [0.92647059 0.95588235 0.94202899 0.95652174 0.94202899 0.95652174 0.95588235 0.91176471 0.95588235 0.95588235] mean value: 0.9458866155157716 key: test_roc_auc value: [0.875 0.8125 0.86607143 0.74107143 0.92857143 0.92857143 0.78571429 0.79464286 0.78571429 0.9375 ] mean value: 0.8455357142857143 key: train_roc_auc value: [0.91911765 0.93382353 0.92689685 0.95620205 0.93424979 0.9488491 0.94170929 0.91240409 0.93446292 0.92721654] mean value: 0.933493179880648 key: test_jcc value: [0.77777778 0.7 0.75 0.6 0.85714286 0.85714286 0.72727273 0.7 0.72727273 0.875 ] mean value: 0.7571608946608946 key: train_jcc value: [0.85135135 0.87837838 0.86666667 0.91666667 0.87837838 0.90410959 0.89041096 0.83783784 0.87837838 0.86666667] mean value: 0.876884487226953 MCC on Blind test: 0.07 Accuracy on Blind test: 0.7 Model_name: Ridge ClassifierCV Model func: RidgeClassifierCV(cv=10) List of models: [('Logistic Regression', LogisticRegression(random_state=42)), ('Logistic RegressionCV', LogisticRegressionCV(random_state=42)), ('Gaussian NB', GaussianNB()), ('Naive Bayes', BernoulliNB()), ('K-Nearest Neighbors', KNeighborsClassifier()), ('SVM', SVC(random_state=42)), ('MLP', MLPClassifier(max_iter=500, random_state=42)), ('Decision Tree', DecisionTreeClassifier(random_state=42)), ('Extra Trees', ExtraTreesClassifier(random_state=42)), ('Extra Tree', ExtraTreeClassifier(random_state=42)), ('Random Forest', RandomForestClassifier(n_estimators=1000, random_state=42)), ('Random Forest2', RandomForestClassifier(max_features='auto', min_samples_leaf=5, n_estimators=1000, n_jobs=10, oob_score=True, random_state=42)), ('Naive Bayes', BernoulliNB()), ('XGBoost', XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, enable_categorical=False, gamma=0, gpu_id=-1, importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=12, num_parallel_tree=1, predictor='auto', random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact', use_label_encoder=False, validate_parameters=1, verbosity=0)), ('LDA', LinearDiscriminantAnalysis()), ('Multinomial', MultinomialNB()), ('Passive Aggresive', PassiveAggressiveClassifier(n_jobs=10, random_state=42)), ('Stochastic GDescent', SGDClassifier(n_jobs=10, random_state=42)), ('AdaBoost Classifier', AdaBoostClassifier(random_state=42)), ('Bagging Classifier', BaggingClassifier(n_jobs=10, oob_score=True, random_state=42)), ('Gaussian Process', GaussianProcessClassifier(random_state=42)), ('Gradient Boosting', GradientBoostingClassifier(random_state=42)), ('QDA', QuadraticDiscriminantAnalysis()), ('Ridge Classifier', RidgeClassifier(random_state=42)), ('Ridge ClassifierCV', RidgeClassifierCV(cv=10))] Running model pipeline: Pipeline(steps=[('prep', ColumnTransformer(remainder='passthrough', transformers=[('num', MinMaxScaler(), Index(['ligand_distance', 'ligand_affinity_change', 'duet_stability_change', 'ddg_foldx', 'deepddg', 'ddg_dynamut2', 'mmcsm_lig', 'contacts', 'mcsm_na_affinity', 'rsa', 'kd_values', 'rd_values', 'electro_rr', 'electro_mm', 'electro_sm', 'electro_ss', 'disulfide_rr... 'volumetric_ss', 'consurf_score', 'snap2_score', 'provean_score', 'maf', 'logorI', 'lineage_proportion', 'dist_lineage_proportion', 'lineage_count_all', 'lineage_count_unique'], dtype='object')), ('cat', OneHotEncoder(), Index(['ss_class', 'aa_prop_change', 'electrostatics_change', 'polarity_change', 'water_change', 'drtype_mode_labels', 'active_site'], dtype='object'))])), ('model', RidgeClassifierCV(cv=10))]) key: fit_time value: [0.07331181 0.06256533 0.06086087 0.06129169 0.06339002 0.06102061 0.06144905 0.06290483 0.06220293 0.06425667] mean value: 0.0633253812789917 key: score_time value: [0.00836086 0.00896025 0.00843644 0.00837779 0.00837159 0.00880218 0.00834155 0.00841522 0.00857377 0.00888371] mean value: 0.008552336692810058 key: test_mcc value: [0.75 0.62994079 0.73214286 0.66143783 0.87287156 0.87287156 0.64465837 0.6000992 0.64465837 0.875 ] mean value: 0.7283680535735243 key: train_mcc value: [0.83832595 0.87000211 0.88466669 0.91240409 0.86868474 0.89863497 0.90025835 0.88476385 0.9139999 0.84173622] mean value: 0.8813476865607188 key: test_accuracy value: [0.875 0.8125 0.86666667 0.8 0.93333333 0.93333333 0.8 0.8 0.8 0.93333333] mean value: 0.8554166666666667 key: train_accuracy value: [0.91911765 0.93382353 0.94160584 0.95620438 0.93430657 0.94890511 0.94890511 0.94160584 0.95620438 0.91970803] mean value: /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:203: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_CT.sort_values(by = ['test_mcc'], ascending = False, inplace = True) /home/tanu/git/LSHTM_analysis/scripts/ml/./gid_config.py:206: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy rouC_BT.sort_values(by = ['bts_mcc'], ascending = False, inplace = True) 0.940038643194504 key: test_fscore value: [0.875 0.82352941 0.85714286 0.82352941 0.92307692 0.92307692 0.84210526 0.82352941 0.84210526 0.93333333] mean value: 0.8666428798239943 key: train_fscore value: [0.91970803 0.93617021 0.94366197 0.95652174 0.9352518 0.95035461 0.95035461 0.94285714 0.95714286 0.92198582] mean value: 0.9414008786946603 key: test_precision value: [0.875 0.77777778 0.85714286 0.7 1. 1. 0.72727273 0.77777778 0.72727273 1. ] mean value: 0.8442243867243867 key: train_precision value: [0.91304348 0.90410959 0.91780822 0.95652174 0.92857143 0.93055556 0.91780822 0.91666667 0.93055556 0.89041096] mean value: 0.920605141004188 key: test_recall value: [0.875 0.875 0.85714286 1. 0.85714286 0.85714286 1. 0.875 1. 0.875 ] mean value: 0.9071428571428571 key: train_recall value: [0.92647059 0.97058824 0.97101449 0.95652174 0.94202899 0.97101449 0.98529412 0.97058824 0.98529412 0.95588235] mean value: 0.9634697357203751 key: test_roc_auc value: [0.875 0.8125 0.86607143 0.8125 0.92857143 0.92857143 0.78571429 0.79464286 0.78571429 0.9375 ] mean value: 0.8526785714285714 key: train_roc_auc value: [0.91911765 0.93382353 0.9413896 0.95620205 0.93424979 0.94874254 0.9491688 0.94181586 0.95641517 0.91997016] mean value: 0.9400895140664962 key: test_jcc value: [0.77777778 0.7 0.75 0.7 0.85714286 0.85714286 0.72727273 0.7 0.72727273 0.875 ] mean value: 0.7671608946608947 key: train_jcc value: [0.85135135 0.88 0.89333333 0.91666667 0.87837838 0.90540541 0.90540541 0.89189189 0.91780822 0.85526316] mean value: 0.8895503809505252 MCC on Blind test: 0.06 Accuracy on Blind test: 0.66